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PREFACE 


This handbook Is based on an eight-year program of research 
and development supported by the Agency for International Develop- 
ment, and carried out by the American Institutes for Research. The 
opinions expressed as a result of the study are not necessarily those 
of the ^ency for International Development, however. The objectives 
were, first, to devise techniques of aptitude testing that could be applied 
in cultures in which standard ability tests are not fully effective; and, 
then, to assist in the application of these techniques to human resource 
development programs In the developing countries. 

The research proceeded through four major phases, as follows: 

Phase I was a feasibility study to determine whether the prob- 
ability of devising effective testing methods in these countries was 
sufficiently high to warrant the fairly substantial developmental invest- 
ment that might be required. It was begun in July, 1960, in Nigeria, 
which had been selected as a suitable model country, and it lasted 
approximately nine months. The results included a set of principles 
for constructing aptitude tests in the Nigerian culture, a series of 
twelve prototype tests that had been developed in accordance with 
these principles for purposes of experimental evaluation, and evalu- 
ative data that showed these tests to be reasonably effective. The 
development of appropriate testing devices appeared to be an entirely 
feasible proposition. 

Phase n consisted of the further developmental research that 
was necessary to translate these prototype tests into operational 
forms and to devise measures of other important aptitudes that had 
not been included in the feasibility study. It lasted from October, 

1961, to June, 1963, and resulted in a set of twenty-one ability tests, 
which we termed the I-D AptltudejSeries, As part of the research, 
these tests were validated and normed for a wide range of practical 
applications throughout Nigeria; by the end of Phase n, a number of 
operational testing services were actually being provided to education 
and training establishments, and to private employers. Additional 
validity studies were carried out in Ghana, Liberia, Tunisia, and Mali 
as a first step toward extending the program to other African countries. 

Phase in was the first stage of the Institution-building effort 
that, through a variety of successor projects, is still being carried 



(orrard today. It beean In November, 1963, with the objective of 
establishing a Nigerian center lor aptitude testing that could take full 
responslbUlly lor the continuing development of the program, under 
the supervision ot a Nigerian prolesslonal stall and with entirely local 
Ibiancmg. In July, 1964, the Nigerian Aptitude Testing Unit began 
operations as an MlUtale ol the West African Examinations CouncU; 
and In January, 1966, It became an Integral part of that oreanlvuinn 

tor Test Development and Hesearch. Under IM^ expanded 
charter, the unit was responsible lor all of the council's test 
menl programs-achievemcnl as well as snlllnd^ 
provide services outside Nlgem toVe , Sleets 
of Ghana, Gambia, and Sierra iSone Conmr^^^n'’ '““Hrlcs 
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developing world. It tt^s pTCeSfe^'l^" <>' •>■<> 
Nigerian studies bi these countries an^ fspHeatlon of the 
period May, 1966, to June, 1968, In’^ailT 

had agreed to participate as the new m^ei Thailand, which 

were, first, to develop a testing program In The objectives 
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much more of the research had to be carried out ia remote rural 
locations. That techniques effective for second-language examinees 
in rural Africa would be adequate for most applications elsewhere is, 
In retrospect, not really surprising. 

At the same time, however, these factors also suggest that 
certain of the adaptations incorporated in the I-D techniques may not 
be necessary for adequate testing in a less challenging setting. Some 
of the more economical features of the standard testing approaches 
can perhaps be reintroduced for greater efficiency in such locations. 
Preliminary studies of this possibility were included in the Phase IV 
research, but the specific conditions under which a certain modification 
is or is not required have not yet been determined. 

Similarly, the highly practical orientation of the research does 
not permit us to say that the I-D techniques are the only effective 
approach to test adaptation, or even the one that is best. When we 
found that a certain procedure provided results of sufficient accuracy 
for operational use, we generally stopped that aspect of the research 
and did not look for something better. And we have therefore felt 
constrained to temper the typical "Thou shall ..." style of a pre- 
scriptive handbook with the more modest "It was found that , . . " 
approach of a descriptive report. 

In the planning of the handbook itself, two major decisions had 
to be made. The first concerned the audience to which the content 
should be addressed. The national development planner, the technical 
assistance official, the educator or the employer, the senior testing 
specialist, the recent graduate of advanced measurement courses— all 
of whom play an important role in the expansion of testing in a develop- 
ing co\mtry~would clearly be interested in different aspects of the 
research, and would find maximum utility in different kinds and depths 
of discussions. The second and somewhat related decision concerned 
the degree to which the handbook should he self-sufficient in presenting 
the essentials of testing, including techniques (e.g., statistical analyses) 
that are unaffected by cultural variations, and the degree to which it 
should concentrate on the unique aspects of testing in a developing 
country, requiring the reader less familiar with measurement prac- 
tices to consult other sources. 

On the first issue, we decided that instead of addressing the 
entire handbook to the interests of a single audience or to a 'happy 
mean" of the interests of a number of different audiences, we would 
try to organize the material into reasonably modular chapters 
address each to the audience for which it is mainly intended. T us, 
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On the second issue, we decided not to attempt to make the 
handbook entirely self-sufficient. Basic measurement concepts, such 
as reliability, are not explained; for adequate definitions of these 
concepts other sources must be consulted. The appropriate sources 
are the more widely used textbooks on the essentials of measurement, 
which we have foimd to be generally available in the developing coun- 
tries. 


A number of standard techniques (i.e., techniques not affected 
by cultural variations) are described in the technically oriented 
chapters, however. This was considered necessary whenever we used 
a technique that we judge superior but that is not the one most typically 
used by other practitioners in the field of testing; or when a certain 
technique Is described only in sources that are not readily available 
in the developing countries. Such discursions from the culture-tied 
aspects of testing will be found mainly in the Chapter 3 and Chapter 7 
discussions. 
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CHAPTER 


THE IMPOflTANCE 
OF 

"CULTURAL" FACTORS 
fN TESnWG 


Most of the topics discussed in this handbook are addressed to 
the raethodological questions of how tests can be adapted for use in a 
developing country. For it has been in the design of suitable testing 
procedures that the AID/AIR research has made the most significant 
advances, and can offer the most useful suggestions. Principles, 
techniques, and practical applications all are discussed in detail. 

Yet, the how issue is only one of the questions that arises when 
a new test development project is being considered. Even more 
fundamental is whether the testing that is to be done actually calls 
for especially adapted procedures, and whether (if in theory it does) 
the practical gains will in fact justify the necessary investment. The 
first decision that has to be made is whether a proposed test develop- 
nient project is worth doing at all. 

Part I of the handbook is addressed to these basic whether ques- 
tions. Chapter 1 examines the legitimate reasons for test adaptation; 
Chapter 2, the related cost-effectiveness implications. 


SOME EXTREME VIEWS ON 
TEST ADAPTATION 

Many of the educators and personnel officers who operate 
testing programs in the developing countries hold widely divergent 
views on the merits of test adaptation. At the one extreme there are 
those who look mainly at the vast environmental diilerences between 
the developing countries and the highly industrialized nations, and 
conclude that any test designed for the one Ipso facto cannot serve 
the other. At the other extreme there are those who attach greater 
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importance to the fact that the skills needed in both developed and 
developing countries are exactly the same, and who fear that "simpli- 
fiecT testa will hamper them in producing equally high levels of skill 
in their own populations. The former would exclude practically all 
of the classic testing procedures from use in a developing country, 

S designed in and for the Western cultures; the 
tac\t anything else, since this would be a 

tacit acceptance of lower performance standards. 

elaborate ufL'ir/h"* be useful to 

course of the projKt. '“‘‘““‘"B examples occurred during the 
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use of a single test based on it ®if®usly objected to the continued 
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psrts of the country wS sooL^ffc uludents from dttferent 

Ulttercnt languageT^earninTo^hT ‘ had quite 

lake these dSwences into lest did not 

different English tests each desi * lyus for a series of 

language structure, so* that each ^ ^ certain indigenous 

Ibis series that look acM^ro, th?„ 

tongue. Though the test develonmtii °'™ "dtive 

assistance almost certainly could be ohta? ®t“®™l 

assistance be sought. obtained; and he moved that such 


eutulnatlon, they insisted, Is simnfi to d^’i “t un EngUsh 

student has mastered the languace^o a ''hether or not the 

ground Is irrelevant to this obSetoe ? noooptable degree. His baefc- 
dcyeloped. The appropriate aSaeh Th be 

rc^ the U.K. standard of “'“bers fell, was to 

‘ontlng English profWency ^'“1^^’’'““"" 
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The second example is more complex, in that there was not just 
one but a wide range of possible modifications to be considered. 

The manager of the local branch of a large American firm was 
trying to recruit trainees for advanced technical positions critical to 
his company's operations. He asked one of the American professors 
teaching at the local university for help; and the professor obliged by 
recommending fourteen students whom he considered exceptionally 
well-suited for these positions. Each of the students came in for an 
interview; each was given the firm's standard selection test; each 
failed the test; and each was rejected. 


In the postmortem discussions, the professor took strong excep- 
tion to the selection procedure, and pointed out three specific short- 
comings in the approach that had been used. The first was that cer am 
parts of the company’s test included problems of a type ui^own m 
the local setting and therefore not suitable for local use; these should 
have been dropped. The second was that in the ® ' 

not even the most obvious of cultural adaptations had been attempted^ 
The mathematical exercises Involving dollars and ’ 

clearly should have been changed to the local currency a 
used. And the third was that evaluating the ^ 

basis of American norms (which had Indeed been the p . , 

completely absurd. Given the huge differences m the o' 

American and local examinees, no single standard could possibly 
serve as a meaningful yardstick for both sets o scores. 


The branch manager disagreed. The job ‘o be Performed m 
this country, he said, is identical in all f '° job that me^ 

company's American employees are per^rming . ^ 

Exactly the same kinds of Md skiUs ’ j measure 

locations. And, since <^6 comp^y's tus' bad 

precisely these kmds of H the requisite skills 

as it is there. He was prepared to TinitPd stales but this 

are less hiSb'y bere , applicants’ would have 

meant only that an unusually ijrg iob Changing the test or 

to be tested to find P^oplu ^ simplify his present recruiting 

adjusting the uorP-b 'yob'O.be pseple who subse- 

^rtrcournTre^t^e ^a^, anIwL therefore short-sighted. 


and so the students were not employed, and the companjt con- 
And so the smoen „„ie affected by the company's 

tinned testing. The num l^^P 

decision m «b“ 0 le ^re similarly debated, 

arise in many large lesiuib w* t» » 
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The admission of foreign students to American universities is another, 
perhaps more familiar illustration. Should one argue that the foreign 
student will be expected to perform at the same level as his American 
classmates, and therefore use the university's standard admission 
procedures? Or should one consider the differences in the backgrounds 
of the forei^ applicants, and try to make suitable adaptations ? And 
if one does decide to adapt the test, how much adaptation is needed? 

alfected by inlercultural 


procei^ tata "vide the testing 

(and quite dlltercntl , “'“v vomponents, and to examine their separate 
raUoLlc 21 the components are 1) the test 

isiussed « f ° mechanics, 

.reate'r Si4\'‘de\i.tl“LXs!" ““ 


THE TEST RATIONALE 

does not'p”ck”lhe test°Droblam' purpose, he obviously 

that ho uiSwfute ‘hboo 

exercises so that they will have \ **0 constructs the lest 

real-Ufe sklU that the test is m 'bEtcal relationship to the 

is called the test rationale. ousure. And this logical relationship 

he begins by analyzing Uie Iob*of carpentry course, for example, 

oI skills might make certain Indivie^?^^^^^ which types 

He might conclude tLt c^rtaifsUnS otters. 

claUy Important; so. he woS^d.^, movements are espe- 

movcmcnls as closely as wsslble ^ d duplicates these 

dlUcrcnces in this skill ai^ne^t^tS' measure Individual 
,7‘?“™=''‘i' hetween the muscl^1ilji,!f^,“®‘"'“°" Pv»cedure. The 
test and the muscle movements Programmed into the 

Ihe underlying test rational^ ^ *““'<< constitute 

['““jdhly conclu™thTt tte rltio^**"’ “"i' eonstructor can 
. ^eal relationship he posited r indeed sound— that the 

he abTet He assume" 

” effective. In the^aW^* ^‘atlonale explains why 

^ selection «ample, be would say that th7~^ 

s that sitiiied carpentry r^^iS^ muscle move- 

■quires. And he would not hesitate to 
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try the same test again for other jobs that seem to require similar 
kinds of physical manipulations. 


Thus, the "standard" American or European tests for a given 
purpose have not only a statistical pedigree but also a rational explana- 
tion, The first question that must be asked from the point of view of 
test adaptation is whether these rationales are equally sound in a de- 
veloping country. For, if they are not— i.e., if using a certain type of 
test cannot be justified on rational grounds— it would clearly be waste- 
ful to examine the appropriateness of its content or other specific 
details. The need would be not for adaptation but for an entirely 
different approach. 


On this point, the AID/AIR findings are reasonably conclusive. 
Most of the standard test rationales are equally valid in a developing, 
country . Each of the I-D tests is based on a standard test rationale, 
and each is effective for the specific applications for which this^ 
rationale is intended. Though many of the physical characteristics of 
the standard tests had to be changed to fit local conditions, the under- 
lying rationales remained the same. If a test of ability A was known 
to predict success In curriculum X in the United States, an appropriately 
adapted test of ability A wouid consistently provide comparable results 
for curriculum X in an entirely different cultural setting. 


This finding has important impUcations for the testing specialist, 
in that it permits him to apply the vast literature of 
work, irrespective of the cultural setting. By beginnmg i 
already known, as elaborated in Chapter 3, he can often effect a sub- 
stantial saving. The wheel doesiiot have to be reinvented. 

For the administrator or funder of test development programs, 
there are similar implications. In accordance 
finding, he can appropriately use the degree to ^ 

pr^ect builri^ o^iertL^^rwould^U"^^^^ rationales before checking 
“ity genenaly should not he underhUten. 

in the early days 

Alrlea, much elsy to administer to naTve 

were used “^‘y^Ys fs /ertops the main reason that this subslanllal 
exammees. Ms is pe P Beginning with excellent rationales 

^d theniry°lng"To solve the dUllcultles In applying them Is likely to 
be much more productive. 
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Similarly, low priority should be assigned to a project that 
would analyze a standard job or curriculum to determine the specilic 
abilities that it requires in a particular country. If the component 
skills have already been identified in studies elsewhere, any local 
differences that may be found by repeating these studies will probably 
not justify the time and effort required.* The standard findings 
should be applied, at least for the first set of trial testing procedures. 

Exceptions to the universal applicability of the established test 
rationales occur rarely. But the errors can result from the use 
of an Inappropriate rationale are so sizable that the logic of each ^ 
proposed rationale should nevertheless be double-checked before it 
is used, to ensure its suitability in the local situation. 

The use of achievement in earlier school courses as a predictor 
of performance In more advanced courses Is one case in point. In the 
educational systems of most developing countries, the most crucial 
selection decisions are made at the transition from primary to 
secondary school, which is the make-or-break hurdle for most of the 
country’s youngsters. And, typically, language and arithmetic achieve- 
ment tests are used as the major indexes for these admission decisions. 
Is this an effective procedure? 

Looking at the results of American testing experiences, one 
finds that predictors of scholastic success better than achievement 
tests have not yet been discovered. For most applications, the use 
of achievement measures is one of the irwhcated selection procedures. 
But a second look will show also that this experience has been based 
largely on the testing of high school students for college admission, 
since selection for secondary schools is not Important in the American 
system. And so, before investing in these types of tests, it is neces- 
^ry to ask whether the rationale for using achievem^t tests, however 
thoroughly validated for university admission in the UoUed States, is 
ogicauy generallzable to secondary school admission in a developing 
county. ® 


ie w, *^Jlonaie for achievement tests as scholastic predictors 
e on three major assumptions. The test constructor reasons 


that difference that does arise is in sklU requirements 

la other ^ checked for certain jobs 

spec Without 
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il V*® educational experiences that the appiicants were pro- 
vided by their past schooling have been approximately the same; 

2. if the differences in the applicants' relative achievement, 
given these equivalent e^eriences, were the result of certain differen- 
ces in their individual abilities and characteristics; 


and 

3. if the advanced courses for which they are applying will 
require the same abilities and characteristics; 

then the applicants should continue to achieve at different levels, just 
as before. Having through these assumptions established a Jinh between 
their past performance and their future potential, he can logically 
select the highest achievers for admission to the advanced courses. 

The second and third assumptions are as reasonable in a develop- 
ing country as in the United States. But the first assumption is not. 

For as heterogeneous as the American high schools are in the quality 
of their instruction— and this has, in fact, been the major concern in 
U.S. admission procedures— the primary schools of the developing 
countries are much more heterogeneous still. In many countries, 
the use of achievement tests can be expected to select sot the students 
with the highest potential, but those who happened to attend the better 
primary schools. Many of the country's most talented youngsters 
may well be denied admission if this ’’standard” procedure is the one 
adopted. 

Similar anomalies can be found in a number of other traditional 
rationales when applied in different cultures. It can easily be deduced, 
for example, that the very high predictive power that can normally 
be expected from English vocabulary tests will be greatly reduced 
when English is the second language of the examinees; that abstract 
reasoning tests will probably not measure "intelligence" in settings 
in which this skill is not practiced in preschool days; and that cross- 
cultural comparisons of mental ability simply cannot be made.l 
Checking the applicability of the test rationale to local conditions is 
always worthwhile. 

Yet, even when such anomalies are discovered, the principle 
of beginning with models of excellence should continue to be applied. 

U one of the established test rationales does not fit, there is probably 
another that can be applied. 
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Thus, having ruled out achievement tests as predictors of per- 
formance at the secondary school level, one should consider the alter- 
native approaches that have proved useful, to see if any of these are 
better suited to local conditions. In this particular instance, he will 
In fact find two suitable rationales that, used in tandem, provide a 
highly effective selection procedure. 


fn rationale for screening. tests, which are designed 

to cUmi^le unlit applicants rather than to identify those who are 
especially able. Here, the test constructor reasons that- 

bi ^ eiven curriculum must have acquired certain 
basic knowledge and skills in order to keep up wilh IheTst^Stlon, 


by slutou'ihrdo‘“,‘’‘' ‘or remedial learning 

^ «udenu Who do not possess these skills when Ihey first enter the 


«lecff N^'mller'^hrSthei^ background should be 

fall as a result of ihelr poor®^r^SiaX° 

selection in an^CT^eto^e^m^i'^ applied lor secondary school 
and skill that the secon^ry schoS' co'^ specHic items of knowledge 

“d If a test that raeasurr7»h!c A requires can be identified, 

who have not mastered them to t constructed, applicants 

rejected. Such a screelTte^t wnw ^ 
of achievement; but it wUl dlffo f ^ to be a test 

wanting above la two Irapoi^f achievement tests found 

will be based not on the prlmarv k h*® f content 
“enl. Of the secondly achX™? “‘ cequlre- 

Ibe lange of skilU hSg ra^sur^'*^ “"cow considerably 

not to select ihe top 10 percent “ *“1 be used 


“ is necessary neat to selKt “‘la type of tesl, 

Mse 5,““ 'bt or sever,^ir^fu Passed the inlUal hurd 

Silch r ' “‘^'PP'-'ate rauonale ir,LT i} ^ fan this pur. 

»hlch reasor.3 that- “* *”1 °t the seholasllr up 
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1. if a student’s achievement in a given course depends in part 
on the kinds of mental operations this course requires, 

2. U there are tasks that he already knows how to perform 
which require essentially the same operations, 

and 

3. if he has had sufficient practice in these tasks to have reached 
a fairly stable proficiency level, 

then this level should be indicative of the level he will attain in per- 
forming the new tasks the course will expect him to master, Ihe 
prediction is quite simply that those who are able to perform related 
tasks better now will also perform better in the course for which they 
are being considered. (It will be noted that the earlier example of 
selecting trainees for carpentry courses used the identical rationale, 
except that it was applied to "physical" rather then "mental" opera- 
tions.) 

To apply this rationale in a developing country it is necessary to 
find related tasks that the examinees already know how to perform, 
which is a problem that will be discussed at length in later sections. 

For now, it suffices to say that this can usually be done; and that 
the above two-stage selection procedure has already been tried in at 
least one developing country and found highly effective. 

This one application has been described in some detail (o illus- 
trate the importance of analyzing the rest rationale in project evalua- 
tion. It is not expected that the administrator will himself carry out 
this analysis in the depth necessary for meaningful conclusions; but 
he should require that it be done by the specialists who prepare the 
project proposal. Specifically, he should ask for the following: 

1. A written statement of the rationale that will be followed in 
developing the proposed testing procedures; 

2. A review of the prior applications of this rationale In other 
locations, and of its effectiveness relative to other approaches; and 

3. An analysis of the suitability of the assumptions made in 
this rationale in the light of local conditions. 

This will usually provide him with a sound basis for judging the 
technical merits of the proposed undertaking; and, as a by-product 
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of the additional thinking that will have been done at the specialist 
level, result in generally better proposals. 


THE SPECIFIC TEST CONTENT 

The above conclusions bear directly on the fundamental disagree- 
ment noted at the beginning of this chapter, between those who look 
mainly at the similarities in the Job requirements and those who give 
more weight to cultural variations. In almost aU cases, the treatment 
of the standard test rationales should follow the view of the former. 

The Jobs are largely the same, do encompass the same kinds of skill, 
and do call for the same tynes of testing procedures. 

ront.rf rationales are used to generate the specUlc 

to the o nnn*'n w “erclses, however, the approach must shUt 
rationale “PPUratlon of each 

Hon Ot h conditions. The "best" applies- 

tod som time" locations can 

And It Is Imnortant t bllterent sets of test questions, 

respective se'tC\‘rw°h‘fch'i5?»^^ 

studleXt MkhaeTcofe P"*”' the 

interested L the ah C">a «as 

variety of slllrti™ ™ "o»'l™tlon," which has been used for a 

the dimensions ormea^Jera^llte rt''eomX“'°h 
that the Kpelle scorPd f,!. something by sight. He found 
problems of estimation that a« norms on all of the standard 

But, When given ^ American culture, 

cups of rice there are In a^l^nm ^''°^^®'”“®stlmatlng how many 
accurate as a comparison sample of U A 

almIVtoUe‘otX?S.‘et Jr, ““‘“‘y' ‘'■P-'P-- P"P 

With a test based on the length of stra^M American applicants 

gr^p of Kpelle applicants ^ »o >>aal of a 

f P™'- hi both Instances the test 't'* ™ ''“'“"'a of rice in 

Identical assumptlon-that the estlmJll ‘ha 

similar to the estimanra„ * f task posed by the test is 

hat the apecmrjjjr^erctst Job operations, 

be the same. ‘ “ ““ad in the two locations would not 

‘‘‘rPat?irom™e^"t r"uo‘ISleTS;^^^^^ this eicample follows 

r aptitude tests described in the 
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preceding discussion. In accordance with the three assumptions of 
this rationale, an appropriate test problem is one 1) that duplicates 
operations important to the course or job for which the applicants 
are being selected, 2) that the applicants are already able to perform 
at the time they are tested, 3) that the applicants have practiced 
sufficiently to have reached stable proficiency levels, and 4) that 
results in individual applicant scores that are sufficiently different 
to permit selection. Clearly, the tasks that best meet these specifica- 
tions for groups with totally different backgrounds are unlikely to be 
the same; and clearly the tasks appropriate to Group A ca^ot be ^ 
considered to be intrinsically "inferior" or of a "lesser standard 
than those appropriate to Group B. Adaptation of the test 
need not at all reduce the accuracy of selection that has traditionaUy 
been associated with a given test rationale. 


Whether a given task actually meets the above specifications 
whether estimating bowls of rice is in fact an effective pre ^ ^ 
an empirical question. Until the test is actually » ii mav be that 
wise to prejudge the results. In the ““ 

estimatUig volLe is too different from the kmds 
volved in the actual job to meet the first of the ^ 

effective predictor. Or it may be that ^e "’“‘^TKpeJe cultur, 

not be met because this skiU is so highiy 
that all applicants would earn the same f “f- 
tryout dam. there is no way of teUbig. 

problems or any other novel exercises s o,i ^ 

othand -so long as the rationale Is Llture. 

conform to the ■■standard" exercises used m me America btit 
And mat is me first crucial point to be made about specific 
adaptations. 


The second point is an “ rb“s^ 

Before embarking on an S ^ J ,.,. gnv are sufficiently 

mat the backgrounds^^ the^loiml^ ppl^^ The backgrounds of Kpelle 
different to re q uire differ 33timation exercises are to 

b"%Xrtil mts was Shown, it could no. be taken for granted. 


The AID/AIK flndings ^"rrSo^rre\rai^^^^ 
usually do require educational levels, me dUfer- 

versions. But at ier-it the university level, some (but 

ences are progressively ^ without modification. Unless 

not aU) standard ‘“‘^mset^ident! me administrator should ask 
Tol "d:m s“J^t the standard versions cannot be applied. He 
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should imdersland, to accordance with the above principle, that adapla- 

corotov°“,haf'f"'‘l[“^ ^ “ accordance with this 

corollary, that it wUl not necessarily improve a test, either. 


that mmt ““'crns the amount ol background research 

ttot must be done to pmpomt the kinds of adaptatims reoulred It is 

S the Kpel?^- of the type that Cole carried " 

not be constructed But it esUmatlon test probably could 

of the most costly ’stens in background research is one*; 

in the longest delays hor * construction, and the one that results 
From thellS^il,tt^.‘t“ Tj' -“P'cted. 

“ ways even more debmtating^'°n7d “ 


adoptrt tor'”t^^“se's^„f;;,'*/^=^ "oearch that were 

are recommended for oth^^r proved serviceable, and 

first was to do only such prehmii^r!f development programs. The 
ggecif^q material for tho » f ^ would produce 

were tobJ^rltten. 

would not be ipl„ed The se^a '’‘^"cr fascinattog, 

afflcienu sstl^ercit.?^^^!?;^ "-^^^ f coon as there wel’e 

fgsts nroviH. ‘eh the level m accuracY 

cesUrch would 

““ ‘“c “cre basic reseich V '“““"-“P studies, 

carch etforl would be sbilted to other tests. 


.w. wuic*. tBSm. 

muslrat‘'S~ “^CI-D Verbal Analogies Test to Nigeria 
an item such as “w^eriais be studies in school. Thus, in 


coconut and climb 
cassava and ? 

( ) eat 
( ) cook 
Odig 

,. ( ) earth 

^e concept of how 

«Iatioaahlp m would be the 
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There were two major to identUy a sufficient 

test exercises of this general iW®- difficulty for twelve- 

number of concepts that w®®® skill mich as concept formation, 
year-old Nigerian students. In a „-,isHons. The other was to 
there probably are wide differences in concept 

ensure that the test did measure f s^nmnd of the English 

formation, and not in the , requires. Knowledge oi ® 

vocabulary that solving these ,^ould surely be influenced 

second language learned mamly m primary schools a 

by the irrelevant differences m ’“f.Levement tests as predictors, 
were noted in the earlier discussion of achieve 

. if was decided not 


t noted m me earner 

Kioms it was decided not 
With respect to the “““Jr^h A study of the develop- 

to invest in elaborate 'l,,,ture would have been mterestmg, 

meat of concepts in the Nig®®>®" test. Simply writing down 

but was not necessary to the appropriate, 

concepts that came to mind as ^ approach 

them out was far more expedient, and Ihi 

adopted. 


1 “s“ sssr 

For there is no ®h®y "f^ftowSh vocabulary “°icL 

and determining the extent to wnic^^ development of an empiricm 

And so It was decided to toe®t to ® stud^ts taow. m use as 

list of the English words ^ v verbal test items, 

source material in writmg these . 


e material in wrltmgu.,^. English themes from 

The procedure was to ®°“®'^Sooations, and to tabuWe how 
primary schools ^ ^‘/"g^ds at l®®®‘ mr a few doxen 

many students aged were that point, a short- 

At first, all of the obtained. But ai ^ certain 

themes an " ued. Whenever “ be near the head 

cut approach was aged as to b eatch up with 

word was ®‘“’® „ [-(requent that it ““oSent tabulation. And 

meTeadms!it«a®"®‘-"^^^^ 

:rtr^;e» 
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JJ adilitv testing in developing countries 

Its contribution to the prolcsslonal literature on English usage was 
virtually nil. 

In the development of a similar test in Thailand, moreover, it 
was decided not to repeat even this Cragmcnlary research procedure, 
for in Thailand the second-language problem does not exist. • An 
appropriate verbal analogy lest in this country can and should be 
written in Thai. Accordingly, the Thai version was developed by trial 
and error, with no background research whatsoever. 

Despite all of these apparent shortcomings, however, this 
approach did produce operational tests fairly quickly; and these tests 
did turn out to be highly effective predictors. And this should be the 
overriding concern of the test constructor working against time to 
meet urgent operational needs In a developing country. For, although 
basic research surely is vitally Important, it is more appropriately 
carried out in contexts other than that of an applied test development 
effort. 


The spatial and perceptual tests described in later chapters 
will afford additional illustrations. For the developmental procedure 
used for these types of tests might well be described as trouble- 
shooting rather than as research. Each experimental study was ad- 
dressed specifically and wholly to the perceptual dllflculUcs that the 
examUiees were having in coping with the particular set of lest exor- 
cises then being adapted. In addition, each was addressed solely to 
a practical solution of these difficulties, not to an explication of the 
principles or dynamics involved. 

This ultrapragraatic approach Is understandably distasteful to 
most properly schooled researchers. And herein lies the major 
implication for the administrator evaluating the project design. Often 
it will have to be he who roust ask whether a study suggested Is really 
really necessary to the Job to be done. DUferentlatlng the necessary 
from the ’'instructive" is a third important consideration in planning 
the developmental procedure. 


hi? o£ the NlEerian students could 

‘‘“d "sed lor this type ol test. But because there are 

wpro^h^EnSTI?? Whh h" impractical 

lhrouEh<n;t the e?ilry 
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A fourth and final point is that effectively adapted testing pro- 
cedures may in and of themselves not be enough to ensure the success 
of the students or trainees being selected. The course may have to 
be similarly adapted. Even though the test may be highly effective in 
selecting the most able of the applicant group, these very able individ- 
uals may nevertheless not be ready to profit from the standard course 
of instruction. The more the test content has had to be adapted, in 
fact, the less ready they are likely to be. 

This difference between ability and readiness is shown perhaps 
most clearly in the case of the I-D Mechanical Information Test, 
which was one of the most successful of the AID/AIR test adaptations. 
It was developed as a means of selecting applicants for technical 
training institutions in West African countries. And the problem of 
test adaptation, of course, was to measure mechanical aptitude in 
individuals who had little prior experience with modern mechanical 
devices. 

The adaptation process began, as always, with a standard test 
rationale. It was assumed that— 

1. If certain mechanical and scientific phenomena are readily 
observable in each applicant's everyday surroundings, 

2. if certain of the applicants have learned more than others 
from the Incidental learning opportunities that these phenomena pro- 
vide, 


and 

3. if the reason for this unequal learning lies in the applicants' 
unequal interests and inductions, 

then a test of familiarity with these phenomena should identify the 
applicants who are most inclined toward technical careers. The task 
was to find technical phenomena that are readily observable in rural 
Africa, since the gadgets used in the American applications of this 
rationale obviously would not do. 

Finding and validating an adequate number of suitable test items 
took nearly a year.^ But the resulting test was highly effective for 
a wide variety of applications in Africa and, with some further modi- 
fications, in other parts of the developing world. Its validity In select- 
ing technical trainees was consistently as high or higher than that 
being attained by the standard version in the United States, as will be 
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seen from the data presented In later chapters. The test clearly 
accomplished Its mission ol picking out the best ol those who apply. 


Hw the best ol the Alrlcan appUcanU, as identified by this test, 
compare in ablUty with the best of the American applicants, as Identi- 
fied by the standard test, cannot readily be determined. They might 
be the same, or poorer, or belter with respect to potential. Bui it 
was clear that the best Alrlcan applicants were considerably less 
ready the best of the American trainees. For while an African 
who can answer a test question about an oil lamp may show as much 
basic aptitude as an American who can ans-jrcr a question about an 
electric light, he will operate under a considerable handicap when he 
enters a course In which electricity Is the topic of tnslrvcllon. An 
individual who has not the background to answer questions about 
electricity (or gears or pulleys) when they appear on a lest paper 
will have similar difficulties when he encounters them in a textbook 
or lecture. 


Put another way, the content of the standard mechanical test is 
much closer to the content of the curriculum than Is the content of 
the adapted version. And the very fact that the l-D tests had to move 
further away from the actual curriculum to be equally effective shows 
that the African examinee la less well prepared for the course than an 
American youngster of equivalent ability and potential. Selecting 
him with a test appropriate to his background and then subjecting him 
to an Imported curriculum geared to the background of Americans 
does not make sense. And yet this Is frequently done. 

The Implication is that the administrator evaluating a testing 
proposal should also cooslder the suitability of the course for which 
the test is to be used, if the plan is to use an imported curriculum 
with standard instructional materials at the established rate of speed, 
the development of culturally appropriate ability tests may well be 
a waste of time. 


In summary, then, four major points have been made about speci- 
fic test adaptations. The first was changes of content do not 
cheapen or degrade a standard test, and do not Imply that less able 
appUiMts wiU be fielected. Oltcn, adapting a test Is the only way ot 
idenlilying people as able aa those selected in other countries with 
the tra^tional versions. The second was that although adaptation Is 
“ developing country, there can be exceptions; 

^d the need for adaptation eboold be verified before the protect 

T''® ‘Mrdwas that elaborate background research ts seldom 
necessary to produce cUective tesU and should not be programmed 
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when there is an immediate need for operational testing procedures. 
And the fourth was that training failures can as easily result from 
deficiencies in the course as deficiencies in testing, and that a project 
which would cure only the ills of the tests may therefore not yield 
the improvements desired. 


THE TESTING MECHANICS 

In the early planning of the AID/AIR research, an important 
decision was made about the mechanics of the tests to be developed. 
The objective would be to develop tests that could be administered to 
large groups of applicants by people with little or no background in 
testing; that cost no more than a few pennies each; and that were 
suited to rapid scoring by hand or machine. Tests that would have 
to be administered individually, that would use apparatus, or that 
would require professional testers were rejected as too costly for 
use in most developing countries- 

Thls decision led to the intensive studies of the mechanics of 
testing that dominated the early months of the research. For it was 
soon found that taking a test is a highly skilled procedure that an 
examinee unaccustomed to objective testing cannot possibly master 
on the basis of the instructions typically provided. However adequate 
the standard test forms might be for the test-wise American or 
European youngster, they posed an impossible challenge for examinees 
who ^d not had this highly specialized training. Not infrequently, the 
skilled operations that the examinees had to learn to comply with the 
mechanics of the test exceeded in number and difficulty the operations 
involved in the specific skiil the test was to measure. 

Accordingly, it was necessary to change the format of the test 
paper, the manner in which instructions are given, the protocol for 
demonstration and practice sessions, the procedures for enforcing 
time limits, and other special conventions; nearly all aspects of modem 
test practice had to be reengineered. Tlie exact nature of these 
changes will be described in Chapters 4 and 5, and therefore need 
not be elaborated further in this discussion. For the present purpose, 
it suffices to note that all this had to be done . at least at the primary 
school level, and that when done, the resulting tests did meeUhe crite- 
ria of economy and efficiency that had been established at the beginning 
of the research. 

One implication of these findings relates directly to the earlier 
su^estion that the adequacy of standard test exercises should bo 
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checked before extensive adaptation studies are undertaken. To find 
out 'Whether certain content material is suitable in the local situation, 
it clearly is necessary first to ensure that the mechanics used in 
presenting it are not themselves the source of the difficulties that 
may be noted when it is tried. Otherwise, entirely suitable material 
may be ruled out because of trivial mechanical malfunctions. Casting 
the standard exercises into the format recommended in Chapter 5, 
and then evaluating their suitability will provide a much sharper 
guide to the need for content revisions. Even when the examinees 
are reasonably sophisticated, the administrator should require such 
editing of the mechanics as a prerequisite to content evaluation. 

A second implication is that proposals for the development of 
apparatus tests or individually administered procedures should gener- 
ally be rejected. Except for a few high-risk applications (such as the 
selection of airline pilots or the certification of physicians, for exam- 
ple), the cumbersomeaess and expense of such tests should be avoided. 
It is almost always possible to devise a mass-administered test of 
the same ability, using pencil and paper alone. 


RECONaUNG THE VIEWS ON 
TEST ADAPTATION 

With this general background, it may be useful to reexamine 
the two examples that began this discussion. If the above criteria 
had been applied in these instances, what would have been the adminis- 
trator's decision? 


In the case of the English proficiency test, analysis would have 
shown that tins issue was not reaUy a testing problem. For the ration- 
ale of a proficiency test is that— 


1. if a given skill has been defined as consisting of certain 
specific items of knowledge and of certain specified skilled operations, 


2. II a test is constructed that measures an 
of this knowledge and these operations. 


adequate sample 


and 


‘‘‘> “<« kiMw In advance which items wiU be 
included in the sample actually measured, 


then the lest wtU provide an accurate cstimale ol the 
mastery o[ the trtat set ol component items has been 


degree to which 
achieved. The 
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task of the test constructor is simply to select an adequate sample 
of the specific items of skill that the people who developed the curricu- 
lum decided to include in the course as appropriate teaching objectives. 
In this strai^tforward sampling procedure, issues of culture or 
background do not arise. 

The development of different proficiency tests for different 
parts of a country, therefore, is not a methodological test adaptation. 
What it would in fact do would be to establish a different teaching 
objective for each different geographic location. Asking the educational 
system to define and teach a set of different "Englishes" throughout 
the country can hardly be considered an effective solution to the 
problem of interstudent variation. 

The indicated SQlution*~if the problem is indeed serious enough 
to warrant remedial action— is to use different curricula and instruc- 
tional methods as appropriate for these different language groups, 
and thereby enable them all to attain the established teaching objective. 
A proposal for changing either the test or the objective on the grounds 
of background differences alone should be rejected. 

Yet, it should also be noted that under somewhat different cir- 
cumstances the objections to the traditional test that were raised in 
this example would be well founded. Were this same test being used 
as an ability test to select students for advanced education, the differ- 
ences in their respective backgrounds would be vitally important. 

For now the applicable rationale would be that of the aptitude test, 
which does assume that the observed differences in achievement are 
the result of the applicants' individual learning skills rather than 
other factors. And the extraneous differences in their linguistic 
backgrounds might therefore rule out the use of this test, as did the 
differences in quality of prior Instruction in the earlier discussion 
of secondary school selection procedures. 

Thus, the reconciliation of the extreme views expressed in this 
situation depends entirely on the intended test application. If the 
purpose is simply to assess each student’s present status with respect 
to the knowledge and skills the test actually measures, dUleeences 
in background should not be considered. If the intent is to use these 
assessments to infer something about the respective abilities of the 
higher and lower achievers, and to generalize this to their future 
learning potential, their backgrounds somehow must be equated. 

Neither view is always right or always wrong, as is usually the case 
When extreme positions are being debated. 
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In the case ol the American firm's employment practices, there 
was again some justification for both positions. The manager's focus 
on the equivalence ol the job requirements at any and aU locations was 
sound, but should have been directed at the test rationale rather than 
the individual test questions . He should have insisted that the same 
types ol skills be evaluated in this country as had been found to be 
predictive in the United States, but should have realized as well that 
appropriate content modifications would not necessarily reduce the 
test's accuracy of prediction. An adapted form might well have 
been even more accurate in this country than the standard version is 
In the United States. 

The professor's basic thesis, therefore, was sound. But his 
specific suggestions were somewhat misguided. The exercises that 
were not familiar to the examinees should not have been dropped 
entirely; rather, they should have been replaced with alternative 
measures of the same type of skill. And such "obvious" anomalies as 
the use of dollars and cents should not be corrected unless and until 
their inapproprlateness has been demonstrated in empirical studies. 
Cosmetic changes of this general type usually have little effect, and 
may actually be harmful when there is an underlying skill— in this 
instance, working with decioials— that is more or less universal. 

The mechanics ol the test should have received special attention 
along the lines suggested above. And the need for adaptations of the 
sta n da r d training course for this job should also have been considered. 
All of the above suggestions, in fact, should have been applied in this 
situation, which encompasses in miniature the full range of testing 
problems encountered in a developing country. 

This, at least, is the indicated approach from the technical 
point of view. A proposal for adapting the company's test in this 
manner would pass technical muster. But whether the gains would 
justify the fairly sizable costs Inherent in the indicated procedure is 
a separate issue that the administrator must also examine. And this 
Is the Issue to be considered in the following chapter. 


A SUMMARY CHECKLIST OF 
TECHNICAL CONSIDERATIONS 


ih. wsgesUd sight major qaesllons that 

Msi ^ a last adaptation pro- 

f ° sami^liKt betow In a nomrohat different order, 
to reflect the aequenee In which they normally would be connldered; 
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1. What is the rationale for using the proposed test for this 
particular testing problem? 

2. Has this rationale been effective in meeting similar needs 
in other locations? Are there alternative rationales that have been 
found to be even better? 

3. Are the specific assumptions that are inherent in this rational 
consistent with local conditions? 

4. Have the test items that are traditionally used in applications 
of this rationale actually been tried in this country to verify that they 
are deficient? 

5. Is it certain that the deficiencies noted in the traditional 
items are attributable to the test content rather than to the testing 
mechanics? 

6. Will all of the background studies being proposed contribute 
directly to the content of the new exercises to be prepared? Are 
there any nonessential aspects of this research that can be deferred? 

7. Will the resulting test meet the practical constraints on costs, 
examiners, and logistics? Are there alternative formats that are 
more economical or efficient? 

8. Is the curriculum for which the examinees are being selected 
sufficiently attuned to local conditions to make superior selection 
practices truly worthwhile? 

The answers to many of these questions, it may have been noted, do 
not depend on "cultural" issues at all, if this term is used narrowly 
to refer only to a people's habits, skills, and traditions. Differences 
in the development of educational systems, for example, are seldom 
attributable to cultural variations among the countries being compared. 
But this is the term that has generally been used in the context of 
test adaptation, and it will continue to be used in this very broad sense 
throughout the handbook to refer to all of the special needs of the 
developing countries. 


NOTES 

1. For a fuller discussion of these points, see P. A. Schwarz, 
"Prediction Instruments for Educational Outcomes," in R. L. Thorn- 
dike, Educational Measurement (Washington, D.C.: American Council 
on Education, 1971). 
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2. Michael Cole, '*The Puzzle of Primitive People," Psychology 
Today (March, 1968), 48-49. 

3. We are grateful to Barry Eisenberg of the Peace Corps for 
collecting these themes, and to Joan Kontos for designing and super- 
vising the tabulation procedure. 

4. Much of this work was done by Professor Frank Scott of 
Western Michigan University as pari of bis doctoral dissertation. 



CHAPTER 



TESTING REFORM 
AS AN 
INVESTMENT 


Having determined that a proposed test development program is 
sound from a technical point of view, the administrator next must try 
to assess its priority relative to the many other projects that usually 
are pending. For in most developing countries, it is impossible to 
fund and staff more than a small fraction of the programs that clearly 
are "sound” and that clearly are "urgent"; and the administrator has 
in effect to decide which of the programs that cannot possibly wait 
shall be put off still further. The relative payoff of the proposed in- 
vestment is a second crucial consideration. 


Computing the benefits of a program in terms of its overall 
contributions to "national development" has proved to be a most diffi- 
cult proposition. Even for apparently straightforward economic invest- 
ments, no one has yet been able to do this precisely. In such a field 
as testing it is hopelessly unrealistic.^ How much a test development 
pr<^ram would contribute relative to, say, an expansion of the facilities 
for teacher training simply cannot be reduced to numbers. ® 

administrator can and should be provided with as much data on the 
anticipated payoffs and the projected costs as can be assembled. 
Especially when the costs will be high or when external aid will e 
required, such information is essential to what often becomes a most 
difficult tradeoff decision. 


This chapter describes the types of cost-effectiveness 
can be assembled when the aim of the proposed project is limited to 
the improvement of the actual testing procedures . For the presen , 
it will be assumed that the tacilities and institutional infrastructure 
necessary to implement the program already exist, and tha ® 
trained personnel the project requires are already on . h-.,-- 

to invest money and time In the development of new and hopefully better 
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methods will be the only consideration. Then, Chapter 8 will extend 
the discussion to the more typical situation in which the necessary 
capabilities and institutions also must be created. 


THE PAYOFF POTENTIAL 

The net payoff of a change in the present testing procedure 
depends on three major factors. On the posiUve side, there are 1) 
the improvements that will result in the accuracy of the decisions 
that are being made on the basis of the candidates' scores, and 2) 
the reductions of the present logistic dlUiculties of testing that will 
be effected. On the negative side, there are 3) the new problems that 
will arise as a result of change in the established procedure. Gener- 
ally, all three factors should be considered. 


Improvements In the Acciixacy of 
Test-Based Decisions 

Alter the new tests have been developed, the gains in accuracy 
tlat they provide can be measured fairly precisely.* But when their 
development Is still at the stage of a petting proposal, their eventual 
accuracy obviously cannot be determined; only circumstantial evidence 
can be assembled. 

Such circumstantial evidence can be collected at three levels of 
elfort, in accordance with the nature of the problem and the magnitude 
of the proposed Investment. At the simplest level, the data are limited 
to the seriousness of the operational problems that have arisen because 
of Inadequate testing procedures. If the present situation Is intolerable, 
or If the proposed Investment is modest, estimates of the exact magni- 
tude of the improvement likely to be achieved may not be required. 

If the situation is less critical or if the investment is substantially 
higher, some quanlUicalion of the expected gains should be attempted. 


•An important limitation In the accuracy of such after-the-fact 
measures is that the criteria for assessing an examinee's later per- 
formance in school or on the job are generally deficient. None of the 
criteria typically used— grades, ratings, production records, etc.— 
provides an entirely satisfactory, index of the quality of performance. 
The development of more adequate criteria ccntlnues lobe the single 
most Important need of educational and occupational testing. 
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At the intermediate level of effort, the accuracy of the present proce- 
dures is computed and compared with the results typically obtained in 
other countries. The assumption is that similar results will be ob- 
tained here once suitable tests have been constructed. At the highest 
level, this assumption is put to the test by carrying out an actual pilot 
study, using tests similar to those to be constructed. But this can be 
done only, of course, when reasonably suitable tests are available for 
experimentation. 

To illustrate these approaches, examples will be cited from 
studies of vocational selection tests and of secondary school admission 
procedures, which were the two applications most fully discussed 
from the technical point of view in the preceding chapter. Again, each 
example is an event that was actually observed during the course of 
the study. 


Accuracy of Vocational Selection Tests 

It seems appropriate to begin with vocational selection because 
this is how the AID/ AIR project itself began. 


A senior official of the International Cooperation Administration 
(the predecessor of AID) made an inspection tour of the technical 
training Institutes In Sub-Saharan Africa that the U.S. government was 
supporting. He found that in terms of facilities and equipment and 
teaching staff these were all first-class institutions, but that their 
output generally was appalling. The numbers of trainees enrolled 
were far below the capacities for which these institutions had been 
designed; and the proficiency of even these few trainees was lower 
than had been expected. The central program objective was not being 
achieved. 


The specialists he consulted attributed much of the blame to the 
lack of selection methods appropriate to the African culture. Suitable 
tests, they felt sure, would do much to resolve the dUfIculties that he 
had noted. But they pointed out also that the development of tests for 
another culture was a highly risky proposition, and that an investment 
in testing might well result in no tangible returns. 


Still, the magnitude of the U.S. investment In technical training 
was so very much higher than the costs of the testing project su^ested 
that he decided this was a risk worth taking. 

ment, translated into U.S. training dollars, would represent a handsome 
return.2 
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Thus, in the case of the initial Am/ AIR project, it was the first 
of the above approaches that was applied. The judgment of hi^ payoff 
potential was based not on an estimate c£ the degree to which ihe acro- 
racy of selection would be upgraded, but on an assessment of the value 
of each increment in accuracy that might be achieved, which was in 
this instance the overriding consideration. And this criterion is gener- 
ally adequate when the present methods entail losses that the adminis- 
trator cannot afford to Ignore, 


When the magnitude of the losses Is less compelling, an estimsfe 
of the probable gain is usually necessary for an adequate project ap- 
praisal. This was the case in the AIR siurvey of testing needs in Malawi, 
which (though directed at institution-building rather than at reform of 
any one testing procedure) affords a convenient example of both the 
second and third of the above approaches. 


One of the specific test applications examined during a first 
and highly preliminary survey was the selection program of the 
Malawi Polytechnic, which is the country’s premier vocational and 
technical training Institutioi. ^clal aUention was given to the Basle 
Technology Course because It was at this junior level that the most 
serious problems had been reported. 


The records showed that the present class of 60 trainees had 
been selected from a group 1,350 applicants by a three-stage selec- 
tion procedure. The first had been to obtain ratings on their charac- 
ter, scholastic ability, and attendance from the primary schools that 
they had attended, and to eliminate all applicants who had less than 
stral^t-A ratings on these three characteristics. This left 200 of 
the original applicant group. The second stage had been to submit 
the application forms of these 200 to a central review panel, which 
selected the 120 who seemed best paper. And the third h ^d been 
to give this further reduced group English and arithmetic tests, and 
to have each appllcam Interviewed by an experienced instructor as 
the final admission hurdle. 


Yet, despite this methodical procedure fully a third of the 
entrants had been found to be uns^lsfactory even before the first 
term was completed. Five had been dismissed, six had been placed 
on pr^atlon, and nine others were being given one last chance before 
formal action was taken. 


‘avorable selection ratio (22 applicants per 
trainee admitted), and In light of the much better results that had 
been consistently obtained In West African with similar groups, it was 
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concluded that the selection procedure was indeed deficient. And it 
seemed safe to predict that the introduction of suitable aptitude tests 
would lead to substantial improvements at or about the level of the 
West Africa findings. 


Given the limited objective of the preliminary survey-to deter- 
mine whether the situation was sufficiently promising to warrant a 
feasibility study in depth-this midlevel approach of drawing on ex- 
perience elsewhere provided an entirely adequate basis for estimating 
the payoff potential. Had the focus of the survey been limited to the 
Malawi Polytechnic alone, it would in fact have provided an adequate 
basis for even the final decision of whether the development of new 
tests for this institution should or should not be undertaken. 


But because of the high cost of the professional and institutional 
development that would have to be carried out in Malawi to provide the 
necessary infrastructure for testing, this and the other preliminary ^ 
findings justified only a more intensive feasibility check. And for this 
purpose, the highest of the three levels of evaluation approaches was 
judged to be the appropriate procedure. 


In the second study, the survey team administered a set of the 
I-D vocational aptitude tests to this same class of Polytechnic trainees 
(at this point only 49 of the original 60 remained), and compared their 
scores with the grades they had been given by their class and shopwork 
instructors. It was found that the test scores and grades were indeed 
highly related, as measured by a standard coefficient of correlation. 


Then, this purely statistical relationship was translated into 
more meaningful operational terms. First, it was assumed t a e 
25 trainees who had the highest overall grades in class and shop were 
generally successful and represented ’’good” selection decisions, and 
that the 35 lower performers (including the 11 dismissals) were not 
up to par and represented selection "misses.” Then the sta is iM 
findings obtained for the I-D aptitude tests were converted into the 
numbers of selection successes and misses that would probably have 
occurred if these tests had been included in the selection procedure. 


The results showed that if the original pool of 1,350 applicants 
contained a sufficient number of potentially successful trainees-and 
experience elsewhere suggested that it almost certainly did-use ot 
the I-D tests would probably have resulted in 59 "good' selection 
decisions; the one "miss" would probably not ^ave been so ^ 
performer as to warrant dismissal. Or, stated a bit 
the results indicated that 59 of the trainees would have attained the 
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level of proficiency that was presently being achieved by only the top 
25. An additional 34 would have been "good" performers. 

Selecting the top 25 as the yardstick for comparison was, of 
course, an arbitrary decision. The top 20 or 30 or any other similar 
number could have been used instead. The point was to report the 
results to the administrator in a form such as "number of trainees 
who will attain a significantly higher pr(rficlency level," and thereby 
to enable him to assess the magnitude of the gain In terms of the 
tangible Improvements to be expected. This is the relatively high 
degree of quantification that is always available from the third of the 
above assessment approaches. It will be described and illustrated 
further in Part III of this handbook, in the review of past I-D test 
applications. 

Accuracy of Secondary School 
Admission Procedures 

Each of these three assessment approaches can be applied to 
school admissions In a manner exactly analogous to that described 
above for vocational selection. One can look at the high cost of fail- 
ures or at the accuracy that is being achieved here relative to that ob- 
tained at other locations, or actually compute the magnitude of the im- 
provement likely to be effected. The vocational selection examples 
illustrate the standard approaches that are appropriate for all types 
of test applications. 

The following examples illustrate some of the possible variations 
in the ways in which these three methods can be applied. The first is 
an example of one of the many criteria other than student performance 
that may lead an administrator to decide that the present situation 
must be corrected. 

When the results d the secondary school admission tests were 
OMounced, one part of the country obtained far less than Its "fair 
share" of the avalUble places relative to its proportion of the country’s 
total population. This happened also to be the region that had the 
poorest quality of primary schools In the country, and deliberate dis- 
crimination was charged. The tests used, It was claimed, were de- 
signed to deny admission to students from these dlsadvanUiged pri- 
mary schools so as to perpetuate the dominant position of the ethnic- 
political group in power. 


became a major political issue, debated at 
parliament level. Action had to be taken. 
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In this situation, a proposal that on technical grounds promised 
to mitigate or resolve this problem would almost certainly have re- 
ceived immediate attention. Because of the high value of each unit of 
payoff, quantification of the total payoff anticipated would probably not . 
have been required. 

Had the same type of deficiency been uncovered by the pro- 
fessional community, however, or had there not been such severe 
political repercussions, a more precise assessment would have been 
necessary to justify an investment in testing. The following three 
examples, based on a manifestation of the same problem in another 
location, illustrate the successively hi^er levels of quantification that 
can be attempted. 

The principal of a government secondary school in northern 
Nigeria maintained records of the scores that all of his entering students 
students had earned on the admission examination. After a period of 
several years he systematically compared these scores with the 
Students' subsequent academic performance. 

He found that during their first year of secondary schooling, 
the students with the top scores on the entrance test did indeed 
perform much better than the students who barely qualified for 
admission. But during the second year, the relative superiority of 
these high-scoring students was greatly reduced. By the fourth year, 
ihe scores on the students' admission tests and the indexes of their 
scholastic performance had no apparent relationship whatsoever. 

Students with low entrance test scores performed as well and In many 
cases better than those who at the time of admission had scored much 
higher. 

He concluded that the present admission procedure selected not 
the students with the highest potential, but those with the best prepara- 
tion. They started out as apparently superior students simply because 
they had learned more at the primary school level and therefore could 
keep ahead of those who in fact had greater ability and potential. 

Their actual abilities did not emerge until siifliclent lime had elapsed 
to overcome these differences in initial preparation. And he reasoned 
that if the students' actual abilities are not being measured as the 
criterion for secondary school admission, much of the talent available 
in northern Nigeria is no doubt being wasted. 

This analysis is entirely consistent with the earlier discussion 
of the rationale for using achievement tests as predictors, and is 
persuasive from a technical point of view. But for the decision the 
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administrator must make, such findings are not suiflcient They indi- 
cate neither the magnitude of the loss nor the genernlizabillty of the 

data throughout the region. Nor do they even demonstrate that a 

problem exists. If the students admitted were all at sufficiently high 
ability levels, differences In motivation alone would offer an alternate 
explanation for the phenomenon the principal had observed. 


Thus, more information was needed. But extending the above 
approach to a larger sample would have entailed a wait cf four addi- 
tional years and thus was unrealistic. It was decided instead to analyze 
the results of the preceding admission test, using a variant of the 
second of the above evaluation procedures. 


The first step was to obtain ratings from the Inspectorate of the 
Ministry of Education on the quality of the primary schools in the 
various portions of the Northern Region. This made it possible to 
identify "advantaged” and "disadvantaged" locations with respect to 
the quality of primary schooling. 


The second step was to compare the admission test scores of a 
sample of students from these advantaged and disadvantaged locations. 
When this had been done. It was found that the two distributions were 
almost nofloverlapplng. That is, the highest scores of the disadvantaged 
set were about the same as the lowest scores of the advantaged sample. 
If students from the two groups were competing for admission to the 
same Institution, the latter would Inevitably be selected. 


That such large differences could result wholly or largely from 
dllJcrcnces In student potential seemed highly unlikely, on the basis 
of experience in the United States and other countries. To the extent 
that students from the dlsadv'antaged areas were being denied admis- 
sion to the secondary schools, talent surely was being wasted. And, 
by making certain assumptions about the actual distribution of ability, 
and about Us relationship with academic performance (again drawing 
on experience cUewherel, it was possible to estimate the probable 
amount of loss from the students* scores on the preceding examination. 


Here, the comparisons with the findings in other countries were 
ar more complex than In the preceding examples. But the analysts 
provide an Immediate estimate which, however crude, was pre- 
icrable to walling for the results of a proper follow-up study. 


t rtK °° potentially suitable aptitude tests to use for 

moreover, the evaluation could not have pro- 
ceeded beyond this mldlcvcl assessment. These findings would have 
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had to serve as the basis for the administrator's decision. Further, 
•whether they would have been adequate to prompt testing reform would 
have depended mainly on the magnitude of the necessary investment. 


But because scholastic aptitude tests validated for use in 
Nigeria were by this time already at hand, it was feasible to proceed 
to the next higher level. 

Arrangements were made to administer a set of I-D scholastic 
aptitude tests to a sample of last-year primary j 

both advantaged and disadvantaged locations, 

secondary school admission tests were to be given. The", Mter the 
administration of the admission tests had been completed, the scores 
on these two types of tests were compared. 

Comnarative data were available for approximately 500 students. 
On the adi^^ission tests the differences ^“hTanttoie 

disadvantaged samples were as large as befor . exactly the 

tests, the average scores of the two groups were almost exactly the 

same. 

This confirmed the earlier 

the existing procedures; even more itnP" Rchool Quality actually 

tude tests relatively unaffected imurolSs ^ 

can be constructed. Realistic estimates computed, 

to be effected in northern Nigeria, therefore, could now be computeo 

in the absence of high 
this more detailed analysis was the app p 
for an investment in improved admission pro 

The purpose of these and the earlier payoff 

suggest a variety of approaches to the eva u ^ j atoj. are as 
potential. The important implications for the admlnlstr 

follows : 

s; 

basic evaluation approaches. 

5. Boasonahly accurate “"-^^““^‘’-Dteries' 

quite economically by using the appropriate tests of the 



34 


ABILITY TESTING IN DEVELOPING COUNTRIES 


The provision d widely applicable tests for use in such advance ap- 
praisals has turned out to be an important by-product of the earlv 
African studies. ^ 


Improvements in Testing Logistics 


treated PrMeiidral improvements Is being 

coordinate with the above discussion of quality gains 

®iSL dUflcumiVa“‘‘°/‘““' I" developing iSmt^iS, 

iogisuc difficulties are far more serious than they are in thp hitphiw 


are aim^o ^ote ,1^^/ “k'"* P^yo" 

Again, the eSfon '"’‘f’ '== APPUP^- 

ol the problem alone or aunm.ni a PPPP^Pacations of the severity 

ments lUtely to be effected.* * * '* “ppppccp cJ Ute Improve- 


changes are'absS^esSu'*''ra"°“^^i°"® 'PB^aUb 

'oUowing eaample is n'oTtfa^ tus^ Sr^f^efS c^lS'ry"' 


the numb““t ^LTes fo^STlsTiol '““”“7'= Pc‘n>ary schools 
rise from 30,000 to 70,000 wlUdn a f™ school would 

essay tests had been used asIh^oecZ,^””'. Traditionally, a set ol 
Pure. But even at the pKsent iLd “* ^““^^^100 proce- 
^mlnatlon Division ^'’■°°° applicants, the 

'“Pera by the time the ““"blti-g <>t 

ected increase in papers renr^s^ u*’® '’“’P The proj- 

P ^ec >0 cblocllve teL wai^eLmSS^''. ‘“POP^“>‘b 


T,?" and b 

pcaalble, change could not be aroid^ ° »as Indeed Im- 


apparent, additional iSa Sy be^' T®"" P“‘C“lUcs are less 
lluslrallve ol these kbids of^los, 'cUowbig examples 

in a number ol dilterent comLlen P‘'‘*‘e“P. “ere obsmed 
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In one country the Ministry o£ Education was trying to upgrade 
the quality of its secondary school teachers by offering special teacher- 
improvement courses during the school vacation. But because most 
of the teachers had to devote this time to the marking of the admission 
tests, they could not attend these improvement courses. 

At another location that also mobilized its teachers as scorers, 
the volume of testing had already risen to the point at which the teachers 
could no longer keep up with the numbers of papers to be corrected. 

It was discovered that one overworked teacher had given half of the 
papers to his eleven-year-old nephew to grade, and a national scandal 
resulted. 


At still another location, the marking of the admission tests 
could not be completed until several months after the school term had 
started. A fairly large number of students who had been admitted 
provisionally pending the scoring of the examination had later to be 
expelled, leading to the predictable repercussions. 


Problems of these types might be considered sufficiently serlcws 
to trigger reform, or they might not, depending on the investment and 
on specific local conditions. 

When additional information is needed, a more precise appral^sal 
of the magnitude of the problem is ° i„ ' 

Especially when there are only isolated symp ched: of the 

the example of the eleven-year-old scorer, = rwTroX 

extent to which these problems occur nationwide will ‘^^“‘i^Xer 
a better basis for deciding on the urgency o ■ above 

explorations of the problem itseU are variants of the first of the above 

evaluation approaches. 

The midlevel approach of Xi?ed‘:^r“m^1fmlt!S!'''m “8'“ 
can be applied to the the marking that Is taking so 

last of the above examples, know K ^ neighboring 

many months here is ''=‘''f,““Mes? highly convenient yardstick for 
country (as it actually that probably wiU be 

estimating the magnitude ^ ^„,j„/chanees the efficacy of mechanical 
effected, of specUlc^uUural factors, 

“n Si"" TmmSS It one location can be generalized with con- 
siderable assurance to any other. 

» these eeneraUzatlons. In fact, that local tryouts 
are ra^r/SS^lSed lUe onSy when Uiere Is a specUic reason for 
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tl^ktog that the suggested procedure will apt work that pilot studies 
should be undertaken. This happens most often on issues that have 
more ol an emotional than a factual basis, such as the relative merit 
of composition eaerclses and objective tests.4 If a large number of 
present” objective tests at least as accurlte as the 

mav l^v^tn “hstacted, a demonstration project 

may have to be carried out as part of the evaluation procedure. 

kxe the excepUon ‘ *” 

therefore, Me assessing a test's total payoff potential. 

In much the s^e wm*m earrled out 

to ntibstantiveTevS? Um benefits attributable 

always encompass bolh''quSimthe testing reform should 

"-‘'t -PPens to be tSe^^ det^rnt^r^C^^^^ 


*ne negaUve Consequences 
of Testing Reform 

‘mpresslvo,“’the‘'SnibUUy"o/“eiSr “d seem 

en! ‘hot are creSb° ohould be explore 

riJ? reduce the net value of n In the established orde 

corrective action Projected or 

ooat. Either way, the estimate of co1t!eRe^,f""='' “■= P^blPP'ed 
^ ' ‘^ost-elfectlveness will be alfected. 

“"‘PS bne Of the feaslbUlty studies. 

‘’""““tng.'mmX ms“MorlS »s“ 

^ P'oehlnes, was a sound Investment 
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for this particular country. And, on the basis of an initial appraisal, 
the advantages seemed overwhelming. The teachers who had tradi- 
tionally been used as scorers clearly could not keep up with the rapidly 
mounting volume of testing, and most of the logistic problems noted 
above were already being encoimtered. On qualitative and procedural 
grounds, the use of machines was the indicated solution. 


Yet it was also discovered that the traditional method served 
another important function. Under the terms of the government s 
austerity program, the salaries of the country’s teachers could not 
be increased; the extra pay they received for marking the tests was 
intended in part as a device for raising their income to a reasonable 
subsistence level. Introducing machines would lead m effect to a 
sizable cut in the salaries of the large number of teachers 
Any computation of the payoff of automatic data 

not take this negative by-product into account would have been seriously 
misleading. 

A quite different class of consequences is the 
Is usually caUed the backwash effect. f ^ 

is geared to a set of external examinations, the exoect the 

tend to give the greatest emphasis to the topics that th y ^ 
examinftions to cover. Topics that appear o,. 

Intently; topics that are rarely included are taught s p y 

not at all. 

Tbus it can happen that a 

questions provides drop in the teaching of 

of view but pedagogically leads to “ Masses. This may well 

composition in all of the uo^t^ °„l.„dMt*that offsets all of the positive 
be regarded as an intolerable ‘wlctYve testa and composi- 

payofts expected. Usmg a ^ ay prove on balance lo be 

tion exercises, though much more cosUy, may p 
the better procedure.* 


that was developed in one country was to 
•A useful Ptm'prott't^s ^ ^ the 

administer both objective _ finalists who were selected 

essay tests for only the basis of the obiectivo 

!esm ?r^y“"rSure was not publicized, however, lo avoid 
backwash effects. 
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Nor is the impact of drastic changes in testing limited to the 
teaming profession. Negative reactions may arise also among 
sti^ents, parents, alumni, or any other group accustomed to and com- 
fortable with the traditional procedure. Often, an extensive public 
information program, carried out in advance of the change, is another 
necessary invocfmon*- ° ’ 


vwietles and intensities of potentially negative out- 
almost alwav ™ ^ accurateiy predicted. Unanticipated consequence; 
Sr-elllTf., Uentilying as many of these 

Sucfsl "=vertheless be Included as an 

explicit step of the evaluation procedure. 


Quantifying the Total Payoff Potential 

Payot^Sr^s^^ ^ ■•“BP 

of testing on investmenirm^no^f 'n'’: “■« 

stabUity within the countrv •apltnical training centers, on political 
resources, on the effectireneso'S?^*,.”* potentially productive human 
teacher salaries on teachlnv** improvement courses, on 

reaction. What ire the ®®''oral types of public 

Ot a test's total payoM potentS“°“ ““‘“‘Pa**"'' 

ran be aehieved'S’plnd'fnot^om^* “’^ ‘ieEree of quantification that 
ment method applied, but also on Ih “Pl'miieation of the assess- 
Uat the new test is to 1 basic nature of the r-nm,..,. 

tovestments" are Inherentlv mor^'^ Problems as 'loss of training 
adverse public reaction problem of 
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aeuing on the degree of gSi^; " 
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proficient >ralnees"that wlsotof substantially more 
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possible to impose a common denominator on the different categories 
of payoffs— such as their respective financial implications— the tortured 
assumptions that are necessary to translate each payoff to a common 
base make such efforts unrealistic. And because testing reform usually 
does have two or more separate payoffs, this is another problem in 
quantifying the total potential. 


Accordingly, the procedure that seems most consistent with the 
present state of the art in payoff assessment is as follows: 


1. Identify each of the anticipated categories of payoff that is 
likely to result from the proposed change in testing procedure, 

2. Treat each category separately in determining the appropriate 
type and degree of quantification (if any) that should be attemp e , an 


3. Report the results in the form of a "profUe" of payoffs, each 
of -which is described qualitatively, and some of which are also reduced 
to numbers. 


The synthesis of this profUe into an overaU payoff 

necessarUy be left to the Judgment of the administrator who wiU decide 

on the project’s implementation. 


COSTS AND COST REDUCTION 

The cost of a new testing program consists of the capital inv ^- 
ment that Is necessary to develop an “o„“ed 

th? 7ecurrent expenses that wUl be ,e ,rii test” 

The major capital costs include the 3 ‘^^^001, 

the conduct of one or more of 3 mtistical information 

analysis and revision; and the coUection of the s atlstlcm 

that will be needed to interpret the , j printing, storage, 

tuauy given. The 'Xt'oapers^^t^L 

and distribution of the various tes P P ’ , analysis, and re- 

tests at one or “visions tot must be made to 

porting procedures mid the p administrations. And 

change, update, or ‘“P"'? “p^gmonal categories of expenditures, 
sometimes or futile information programs, tot also 

such as exammer training i* 
must be included. 

nf these items in the standard literature 
Of tesfmr-%"^rap"e ^llovoioping country, they need 
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not be reviewed In detail.5 Bather, the focus will be on the few items 
that are especIaUy susceptible to distortion; and on the savings in 
cost that often can be effected. 


The Capital Investment 

Estimates of the developmental costa of producing a new test 
are necessarily approximations. It Is difficult to Judge the amount of 
research that wUl be required to prepare a form ready for trial, to 
predict the number of cycles of tryout and revision that will be neces- 
sary to produce the operational version, and to anticipate all of the 
contingencies that will arise. Especially when the test Is of a type 
never before used in this country, or when it will have to meet stringent 
statistical specifications, the risk of ^preciable errors of estimate 
cannot be avoided. 


There is a second category of even larger errors that can and 
should be avoided, however. These are the errors that arise from 
the inclusion of developmental procedures which are not truly essential 
and thereby inflate the budget beyond the amount that actually must 
be expended. The extravagance of background research that does 
not contribute directly to the content of the test exercises, already 
discussed in Chapter 1, is one common example. 

A second example, perhaps even more prevalent, is the heavy 
investment in test norms that has been made in a number of coimtrles. 
Though it is one of the shibboleths of testing that every test should 
have adequate "reliability, validity, and normative data" before it is 
put into use, there are in fact many situations in which norms are 
superfluous; and these situations are the more numerous in a develop- 
ing country. 


Basically, the purpose of normative data Is to permit a reasonable 
interpretation of an individual's test score when he is the only one 
being tested or when the decision to be made about him does not de- 
pend on his performance relative to that of the other candidates who 
^e taking the test at the same time (as would be the case, for example, 
if ^ of them could be admitted). In these situations, it is important 
^ score of "38 oid of 50 correct" is good enough for 
emission, poor enough to warrant rejection, or in-between enough 
to suggest outlins off a ..«♦« 


olf a decision \mtil more applicants have been 
1 i-iv A ^ testing. Such judgments are possible only when 

a^jepresentative sample of Individuals has already been 

tor distributions of typical scores) 
with which ihU particular score can be compared. 
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When there is a fixed number of applicants for a fixed number 
of places, however, such external yardsticks are not required. The 
applicant with a test score of 38 is accepted if his competitors earn 
lower scores, rejected If they score higher. Information on his 
performance relative to the general population of nonapplicants is 
not pertinent to the decision. 

In the developing countries situations of both types arise. But 
usually it is only the huge testing programs for which an investment 
in original test development is at all realistic, and most of these 
are of the competitive type for which normative data are not required. 
The inclusion of normative studies in the design of projects addressed 
to these situations will seriously distort the cost-effectiveness 
appraisal, since norming a test Is by far the most costly of the pre- 
paratory data- collection procedures. 

Similar though less serious distortions can also be introduced 
in the design of the other data-coUection activities that are part of 
test construction. The most frequent is the use of larger samples of 
examinees than necessary for the tryout of the preliminary versions. 
Especially in the first tryout, more can often be learned by giving 
the test individually to half a dozen examinees and asking them to 
explain their answers than can be learned from the full>scale analysis 
of a large number of scores. Even in the final reliability checks only 
a modest sample is necessary to determine whether or not the test 
Is ready to use, which is generally the only function these checks are 
to serve. Accurate reliability estimates are useful only when these 
estimates will actually be used to adjust or weight the test scores, 
and it is wasteful to invest in the large sample that a more precise 
estimate requires when such adjustments are not intended. 

Thus, the criterion for evaluating a proposed capital investment 
in testing should focus less on the accuracy of the estimates that have 
been made than on the yplpyance of the items that have been included. 
And the cost Items related to background research and to experimental 
tryouts should be given especially careful attention. 


The Recurrent Costs 

The costs of using the test once it has been developed can be 
estimated much more accurately than the developmental expenses. 

If a step-by-step scenario is prepared of the entire process that will 
be foUowed from the preparation of the copy for printing to the report- 
ing of the results, the necessary Items of cost can be systematically 
listed and, being essentially logistic routines, costed fairly precisely. 
In estimates of the recurrent costs, problems seldom arise. 
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The interpretation of the completed estimate frequently does 
raise problems, however. The evaluation of the recurrent costs Is 
generally based not on the absolute amount that will be required, but 
on the cost of the proposed test relative to that of the present procedur- 
In many situations it is difficult or impossible to obtain an estimate 
of the present costs that Is truly comparable to the recurrent costs 
of the proposed modification. 


One type of difficulty is the strictly mechanical one of extracting 
the information from the budgets and records on file. Usually, the 
office responsible for the conduct of the testing program has other 
responsibUities as weU, and separate records of its expenses for each 
^erent type of activity are rarely maintained. There are single- 
Ime Items for aU salaries, aU travel, and all printing combined; pro- 
ratmg these costs introduces errors of perhaps sizable proportions. 

certain is that the present procedure may not include 

^ property should be Included to make It ettectlve, 
‘*‘“5 incurred therefore do not give an 
PPPPPhoJt. B lie tests are 

that example, but the check-scoring 
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there l^no enti?^L .rtSSf However, 
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For these reasons, even the hardest of cost data will usually be 
found to be much less clear-cut than the bare figures at first suggest. 
As much judgment may be required in the interpretation of the re- 
current costs as in the evaluation of the apparently softer payoff 
projections. 


Cost Reduction 

Assessing the cost-effectiveness of a program makes sense, 
of course, only when the absolute amount of the costs could in fact be 
provided within the total budget that the administrator has at his 
disposal. If the costs are excessive on absolute grounds, the proposal 
must be rejected irrespective of its payoff potential. 

This happens infrequently when the proposal reform is limited 
to the testing procedures, as has been assumed throughout this dis- 
cussion; the costs of substantive and procedural revisions ^one are 
seldom so high as to be considered excessive. But when this does 
happen to be the case, an alternative approach must be sought that 
permits the necessary reductions in the capital or the recurrent costs 
or both to be effected. 

The most feasible way of accomplishing this Is 
sharing the costs among a larger number of users, ^ j 

ment that each one must mahe is trimmed to manag , 

The fouowing procedure, first used in Liberia, 

that is generally useful when it is the low volume of testing that makes 
the costs prohibitive for any single consumer. 

The manager of a local commercial establishment was dissatis- 
fied with the performance of a certain f 

and wanted to upgrade the 

:lr;;h^”:pUro?rhe%Turfem%o^^^ 

testing program. 

f“'"“difrmtrmrerurg“— 

clerical ‘f ‘f of each posiUon-mlght offer substantial 
speciIicaUy to me r^ulre j,, ^nipioymcnt decisions. Ac- 

Sn^ly" a"scrmVct ■Ueral office worker" tests was developed 
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{or use with a wide variety of clerical positions; experimental tryouts 
confirmed that it was indeed generally effective. This solved the 
problem of the excessive capital cost U\at would have been required 
to develop a separate testing program for each employer. 

The problem with respect to the recurrent costs was that 
applicants for these positions were recruited one or two at a time, 
and administering the tests to such small groups was far too expen* 
sive. Yet, no employer could expect to 'Tiold" his applicants for two 
or three months while waiting for the group to build up to the thirty 
or more that were needed to conduct an economical testing session. 

The solution was to hold a regularly scheduled testing session for 
office workers "every other Monday" at the central testing center , 
and to invite all employers to send their applicants to these regular 
sessions. Some Mondays, there were only a few; some Mondays the 
room was crowded. But the average per capita cost was sulficlenlly 
low so that an extremely modest fee met all of the recurrent expenses. 

In a developing country, where there are generally large numbers 
of small employers, and where it is seldom possible to purchase suit* 
able "off-the*shelf" tests from publishing houses, this is an effective 
compromise, and one that can be used for public as well as private 
employers. 

In this example, the solution depended on the existence of a 
central testing center. And, as will be noted also in the subsequent 
chapters on Institution-building, this is generally a prerequisite for 
significant cost reduction. 


A SUMMARY CHECKLIST OF 
COST-EFFECTIVENESS CONSIDERATIONS 

P*®c®ding discussion was addressed to the many issues 
arise ui evalu^ing the cost-elfectiveness of a proposed change 
an existing testing procedure. The following eight lines of inquiry 
were suggested: ^ 


Vw. for which the new tests will 

?! quantifying the anticipated improvement: 

^ appraisal? Or should some 

quantification of the probable payoffs be undertaken? 

aie “■'“ssarj, is it suliicient to compare 

does th^ matm?!. findings in other countries? Or 

agn ude of the investment require onsite pilot studies? 
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3. If a pilot study has been completed, have the statistical find- 
ings been translated into a measure of the tangible improvements that 
would be effected? 


4. Does the payoff assessment include all of the different 
categories of payoff, in quality and in logistics, that can reasonably 
be expected? 

5. Have the potentially negative consequences of a change in 
the traditional procedure been adequately considered? 

6. Are all of the developmental steps that comprise the proposed 
capital investment really essential? Are the scope and scale ® 
experimental tryouts consistent with the applications that will m ac 
be made of the data to be collected? 


7. Is the estimate of the recurrent cost of the present approach 
truly comparable to the estimate of the recurrent costs of the new 
procedure? Are there important qualitative differences that should 
also be taken into account? 

3. Are there other slmUar test needs in the country that could 
also be met within the scope o£ this project, so as to reduce the 
magnitute of the per capita investment? 

These questions and the ones listed at the end 
chapter are cumulative, in that both sets must be 
assessment of a proposed change in established 
Similarly, the checklist of institutional issues m ® ‘f 

added to these two for a comprehensive evaluation of an institutional 

development proposal. 
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THE ABIUT/ES 
TO BE 
MEASURED 


In this and the following chapters, the focus of the discussion 
shifts from the evaluation of a testing proposal to the development 
of actual testing procedures— from the concern of the administrator 
to that of the practicing test constructor. The culture-tied problems 
previewed in Chapter 1 will be reexamined in greater detail and 
specific guidelines for resolving them will be suggested. 


The first step, as before, is to examine the basic question of 
what it is that the test should measure. Given a group of applicants 
with specified bacl^round characteristics, given a course or job in 
which certain types of skilled performance will be expected, and 
given a total of perhaps two or three hours that can be devoted to 
aptitude testing, what kinds of test exercises should be developed? 
What set of skills, measurable within these practical limitations, is 
likely to be maximally predictive? 


To develop the full range of issues that are pertinent to this 
question, the discussion begins with a review of the general methodo- 
logy for developing suitable test specifications for a course or job 
for which adequate tests have not previously been constructed— with 
the basic steps to be followed in all original test construction. Then, 
this methodology will be reexamined for the special case of generaliz- 
ing tests already developed to new geographic locations, to identify 
the steps that can be skipped and those that must be repeated. 


developing tests for new applications 

As already noted in Chapter 1, the context of an aptitude test 
cannot be taken directly from the course syllabus, as is done In the 
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design o£ achievement and proficiency tests. Only skills that the 
applicants have already acquired at the time of admission h® 
aSessed. The test constructor must therefore develop a strategy o 
"rationale" for selecting from the many dozens of specific sWlls in 
the applicants’ present repertoire the few that will best predict the! 
mastery of the new skills the course will require. 


The development of a complete test rationale is a three-stage 
procedure.^ The first is to obtain an accurate description of the 
skilled performance that the applicants will be expected to master. 

The second is to analyze the characteristics of this performance, to 
determine the kinds of abilities that it will require. And the third is 
to decide on the nature of the test exercises that will reliably measure 
these kinds of abilities in the population of applicants to which the 
test will be applied. If the resulting instrument is effective, the test 
constructor can assume that the many inferences involved in this 
process were made reasonably correctly. If not, he must look for 
tlaws in his logic and repeal pari or all of the developmental procedure. 


Description of the Performance Expected 


The two major requirements for an adequate description of the 
performance to be predicted are 1) that it describe this performance 
objectively, in terms of the specific activities required, and 2) that 
it indicate the relative importance of these activities to the overall 
criterion of successful performance that will be applied. Detailed 
descriptions of the actual activities are superior to the more general- 
ized statements that are typically obtained when an administrator or 
teacher is simply asked to describe the outcomes expected, since 
such generalizations as may be appropriate are better made by the 
test constructor, who is more familiar with their measurement impli- 
cations. Priorities are important because all of the component activ- 
ities can usually not be incorporated in a practical testing program, 
and those most critical to success must be determined. 


For the design of educational admission programs, therefore, 
a course syllabus is less adequate as a comprehensive description 
than are lesson plans that show the step-by-step learning process 
that the students will be expected to follow. And copies of typical 
^amlnatlon papers are less useful as an index of the relative impor- 
t^ce of the various sldlls Uught in the course than are records of 
SLr'fh?? ^f^swers to each of these questions. For, from the 
pr^um^)Iv P^lormance-which are 

b^aZ toprevM aiartssion procedure Is to correct-can 

ue quite specifically determined. 
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Descriptive data, for occupational selection programs are more 
difficult to obtain because job descriptions as detailed as lesson plans 
and assessment procedures as objective as final exams do not exist 
in most job situations. Usually, the test constructor himself must 
develop the highly specific descriptions that are required. One approach 
is simply to observe the job being performed, which will result in 
statements such as those in the following example. 

When he receives each sales receipt, he copies the sales price 
of each item the customer boi^ht in one of the twelve columns on the 
ledger sheet, according to the category of merchandise that this item 
represents. (Every item in the inventory has been assigned to one of 
twelve categories of merchandise, and for these categories separate 
records are kept.) At the end of the day, he totals each of the twelve 
columns of four-digit and five-digit numbers (there is an average of 
about six hundred receipts and 1,800 items processed per day), and 
writes the category total at the foot of each column. Then he adds 
these subtotals to obtain the grand total of sales, which is checked 
against the total obtained by adding the daily sales that each of the 
five salesmen reported. Any discrepancy in these two totals must 
be rectified before the books can be closed for the day. From such 
an action-oriented description, a very precise account of this part 
of the job is obtained. And some gross judgments of priority also 
can be made on the basis of the data presented. It can quite reasonably 
be assumed, for example, that accuracy In copying the entries of the 
sales slip onto the ledgers is more important than accuracy in adding 
the entries together, since the additions can be much more rapidly 
checked than the transcriptions when a discrepancy is discovered. 

Whether such reasonable assumptions about the priorities are 
in fact applicable in this particular situation, however, cannot be 
determined without further study. It may be that transcription mis- 
takes have occurred so rarely that they are in reality less important 
than the more frequent addition errors, because these have too often 
required the salesmen to stay after hours and caused employee dissen- 
sion; or than the errors made in entering the figures in the correct 
columns, which have necessitated costly audits when the sales and 
inventories are cross-checked at the end of each quarter. Or it may 
be that the employees the firm prizes most highly are those who, 
however careless in any of these clerical operations, are alert to 
subtle changes in the pattern of sales, and call them to the manager's 
attention. Clear-cut priorities can seldom be established on the 
basis of straightforward observational data. 

For this reason, and for greater economy in data collection, 
many test constructors prefer to use a more narrowly focused approach 
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that considers only the extremes of very good and very poor employee 
performance. They have concluded, on the basis of considerable 
experience, that in virtually every job much time has to be spent on 
routine activities that contribute little if anything to the over-all quality 
of the employee's performance; and that data on everything that the 
employee does is neither necessary or appropriate for test construction. 
As an alternative to the observational approach, they interview the 
supervisors of the employees or a sample of other individuals directly 
involved in the job, and ask these to recall actual instances in which 
an employee’s actions led to clearly favorable or clearly unfavorable 
results in the past. From a lai^e number of such specific examples, 
the "critical requirements" of the Job then are inferred. 2 For the 
above clerical example, a list of positive actions, such as 

1. Noted that salesman had charged incorrect price for an 

item; 


♦ u ^i‘. ^hat sales in a usually popular category seemed 

to be slipping; ® ' 

and a second Ust of negative actions, such as 

ledger^' "hen copying sales price lor receipt onto 


during 
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above, for example, it would be possible to give each of the applicants 
a stack of receipts, a code book, and ledger, and (after suitable 
^planations) to measure hfs proficiency in performing the actual 
function. Since the entire task is one that the applicants should be 
able to perform without specialized training, it can simply be replicated 
as a whole in the test situation, and the question of component abilities 
never arises. 

Yet, for three major reasons, such direct job-sample tests 
have limited utility as selection procedures. The first is that suffi- 
ciently stable measures of performance are seldom obtained from the 
examinees’ initial attempt at carryii^ out actual job operations. After 
a month or more on the job, the relative proficiency of the applicants 
might be totally different, perhaps because the ability to remember 
the more frequent codes without looking them up (which is not tested 
in the examinees’ initial attempt) is more important than skill in 
copying or addition. The second is that certain of the critical job 
operations--such as noticing sales trends in this example— are not 
amenable at all to testing by the direct job-sample approach, and 
must be ignored completely when the selection procedure consists 
of job samples only. And the third is that using unique tests for 
each position is generally inefficient. Even within a single firm, the 
skills necessary for clerical operations in the sales department 
almost certainly overlap with those required for purch^ing, accounting, 
and inventory control operations; and the use of separate tests for 
each position would be extravagantly e;q>ensive. 

Thus, the identification of the more general abilities that lead 
to success in the specific operations desired is the necessary second 
step in the development of rationales for most practical applications. 

From the job description he has obtained or developed, the test con- 
structor must infer the nature of the major ability components that 
are involved in these operations, trying to define them In such a way 
that 


1. The components identified are sufficiently distinct to warrant 
the construction of that many separate aptitude measures, and 

2. Each of the individual deflnUiong strikes the proper balance 
between the specificity to this one job that is necessary lor accuracy 
in this one application, and the generality that will lead to a more 
widely applicable selection procedure. 

A more broadly applicable definition is desirable not only because of 
the greater utility of the new Instrument that may be developed from 
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of the separate abilities that exist would be listed. Each of the com- 
ponent abilities would be defined operationally by the content of a test 
that measures it (but no others), and the inventory would be considered 
complete when tests that measure something not already measured by 
the existing set could no longer be constructed. Although much 
theoretical speculation about the organization of human abilities has 
guided and also resulted from this research, the development of a set 
of independent and exhaustive ability tests has in fact been the opera- 
tional objective. 


To develop such tests, one straightforward approach is to admin- 
ister a large number of different test items to a large number of ^ 
people, and then analyze the patterns of overlap in each examinee s 
scores to identify the unigue components. And, using the techniques 
of factor analysis as the statisUcal vehicle, this mamly empirical 
procedure has been the one generally applied. The research began 
with Spearman's discovery that the various kinds of intelligence tests 
seemed to consist of one common factor (presumably the general 
intelUgence" component) and one specific factor pecuUar to the nature 
of the test items; and then took another giah' ^tp to 

Thurstone identified six -primary mental ^WWlesltet seemed to 
explain the major differences In ex^inee ^rform^ce on 56 dmerent 
ahilltv tests His list included verbal, number, spaUal, word fluency, 
Smory, iSd reasoning skills, and this 

influenced the ways in which tesu have been labeled in the thirty 
years since it was first developed. 
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the first, it seems safe to assume that cultural differences need 
be considered. 


Item Specification 


The third step is to decide on the exact content and format of 
the text exercises that will be used to measure the abilities that have 
been identified as important. The major objectives here are three: 

1. To devise problems that will provide maximally reliable 
measures of these abiUties, given the background of the examinees 
for which the test Is intended; 


2. To minimize the number of operations extraneous to these 
abilities that the examinees will have to perform, so that irrelevant 
ability factors are not inadvertently tested; and 

3. To design the overall testing mechanics of maximum elli- 
clency in administration, scoring, and related logistic procedure . 

Although certain compromises will frequently be 

in the tradeoff between higher rellabiUly and greater 

three of these factors shouid be explicitly considered in the develop 

ment of item specifications. 

In the context of the above clerical job, for 
abiUties to be measured might be "speed m classilyh^ o i suitable 
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concern. The objective has been a complete explanation of human 
mental behavior. But clearly, the attainment of this objective would 
also solve any practical prediction problem, since the task of the test 
constructor would thereafter consist of simply selecting an appropriate 
subset of tests from an inventory that is all-inclusive. And even 
today, the extensive data concerning the interrelationships among 
different kinds of test items that have been compiled through such 
research is a highly important resource for the test constructor when 
he encounters pracUcal questions Uke the above in the development 
of a test rationale. 


The second line of inquiry into these issues has been the related 
research of practitioners concerned with specific selection problems. 
Here the basic objectives have been to identify the abiUties that are 
mos mportant to success in the major kinds of educational programs 
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the first, it seems sale to assume that cultural differences need not 
be considered. — 


Item Specification 

The third step is to decide on the exact content and format of 
the text exercises that will be used to measure the abilities that have 
been identified as important. The major objectives here are three: 

1. To devise problems that will provide maximally reliable 
measures of these abilities, given the background of the examinees 
for which the test is intended; 

2. To minimize the number of operations extraneous to these 
abilities that the examinees will have to perform, so that irrelevant 
ability factors are not inadvertently test^; and 

3. To design the overall testii^ mechanics of maximum effi' 
ciency in administration, scoring, and related logistic procedures. 

Although certain compromises will frequently be required, notably 
in the tradeoff between higher reliability and greater efficiency, all 
three of these factors should be explicitly considered in the develop- 
ment of item specifications. 

In the context of the above clerical job, for example, one of the 
abilities to be measured might be "speed in classifying objects in 
accordance with a logical system of classification." To write suitable 
items, the text constructor will first have to ensure that the classifica- 
tion system used In the test is in fact logical for the applicants to be 
tested. For, in addition to the fact that the applicants will generally 
not have the specialized knowledge about the firm's line of products 
to permit the use of the actual products in the test items, there is 
also the possibility that they will not even have sufficient knowledge 
about the presumably more familiar classes of objects that the test 
constructor may decide to use Instead of the actual products to 
eliminate the need for specialized information. However familiar 
they may seem to the test constructor, the classification concepts 
or the essential nature of the objects or the vocabulary used to des- 
cribe them may in fact not be uniformly well understood; if this is 
the case, the first obiective ot reliable measures will probably not 
be attained by the resulting tests. Further simplification may well 
be required. But from the point of view of the second objective, any 
extreme simplifications that would seem to avoid the above difficulties, 
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such as using pictures of objects rather than verbal descriptions, or 
requiring a simple match of colors and numbers, may also be inappro- 
priate, since these clearly entail extraneous operations. Pictorial 
representations may introduce a requirement for speed in recognizing 
objects from drawi^s, which is an ability irrelevant to the job; the 
use of cobrs may inappropriately eliminate examinees who happen 
to be color-blind but are potentially excellent clerical workers. And 
in evaluating all of these options, purely administrative implications'- 
such as the costs of printing in color— also must be considered. 


In seeking guidance on these kinds of issues from the literature 
of past research, the test constructor will find much less pertinent 
information than is available on the other components of the test 
mtionale. For expUclt written rationales addressed to these points 
« exception in past test construction, and it 
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Overall, then, the test constructor in a developing country can 
follow standard procedures in identifying the critical requirements of 
a job, and in deducing from these the major types of abilities that 
ought to be measured. But he is very much on his own in deciding on 
the specific test exercises that will best measure these abilities in 
the particular situation in which he is working. 


APPLYING OLD RATIONALES IN NEW LOCATIONS 

Theoretically, rationales for a given job should be developed 
separately for each organization, even within the same cultural setting, 
for, as a result of differences in the specific job situations, some 
variation in the critical requirements should be expected. The relative 
importance of the standard elements of the job will generally not be 
the same in all organizations, and this should (at least in theory) 
affect the choice of abilities measured. 

But in practice, the actual effects of these differences on the 
resulting test instruments are usually too small to warrant an invest 
ment in new rationales. As will be recalled from the preceding dis> 
cusslon, the test constructor's task in defining the abilities to be 
measured Is to strike an appropriate balance between the specificity 
of the job description and the generality of the underlying ability 
factors; and, in this process of generalization, most of the specific 
situational differences among separate organizations are likely to 
disappear. Although the descriptions of the job developed in the first 
part of the rationale may vary in certain respects, the abilities selec- 
ted for measurement in the second part will normally be highly similar 
if not exactly the same. It has therefore been only when the results 
obtained from the established rationales were considered inadequate 
or when certain aspects of the performance expected were obviously 
quite dlllerent from the standard requirements that organizations 
in the same cultural setting have invested in new rationales. 

For generalizations of established rationales to different cultures, 
the AID/AIR findings suggest that exactly the same principle should, 
be applied: I.e., that insofar as the Indentificatlon of the abilities that 
are most important to a certain job is concerned, the difference In 
geography should be largely ignored. Because the interorganizatlon 
differences In the specific requirements of a Job within a single 
country (e.g., within Nigeria, within Thailand, or within the United 
States) are larger by far than the corresponding differences among 
these countries, cultural variations are in and of themselves minor 
considerations. Unless certain practices of the firm Itself seem to 
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require a uew analysis ol such joba as bank teller, draftsman or com- 
puler programmer, the results of past critical requirement 
can be appUed irrespective of the llrm's geographic location. Job 
analyses made in the United States should be appUcable to Nigeria 
and Thailand, and vice versa. 


For the vast majority of test applications, therefore, the test 
constructor need be concerned only with the third step of writing suitable 
item specifications. As already noted In Chapter 1, the decision as to 
which abilities are the appropriate ones to measure for a given job 
can and should be based on the findings of past studies oi this job, 
here or in other locations. 


Operationally, the development of culturally appropriate item 
specifications can proceed in one of two separate ways. The first 
way is to begin with the test rationales that have been prepared for 
this type of practical application, to retain the description and analysis 
portions, and then to write new item specifications consistent with 
this analysis and with the characteristics of the local applicant group. 
The second way Is to begin with the actual test items that have proved 
effective In other locations, to retain those features that are equally 
suitable here, and then to modify those that are inapplicable to this 
cultural setting. Each way has advantages and limitations. 


The major advantage of beginning with the rationale is that this 
does not automatically predispose the resulting lest to a certain format, 
and permits the test constructor to exercise maximum Ingenuity in 
devising suitable Items— perhaps resulting in the discovery of items 
that are inherently belter than the traditional ones for all locations. 

The disadvantages are that relatively few rationales have been pub- 
lished In concise written form, and that to generate new items from 
scratch, the test constructor may need more complete information on 
the background of the examinees than is available in most developing 
countries. 


The major advantage of beginning with actual test items is that 
the significant differences In the background of the local examinees 
need lust be known In advance, but can be determined through systematic 
1^1 ^ error by using the results obtained when these Items are 
^kistered to local examinees to diagnose why they are ineffecUve. 
The dUidvMtap u that any Inherent (s.e., noncullurall Umftationa 
'“‘■‘hht selected lor eaperimentaUon will probably 
^ ^ '=‘‘«h*'cb. A radlcaUy new and 
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In the case of the AID/AIR research, it was necessary to begin 
with the second approach of experimenting with standard test items, 
because at that time too little was known about the nature of the back- 
ground differences that are important to deduce suitable item specifi- 
cations. From a careful analysis of the ability described in the 
rationale, it might have been possible to determine the type of task 
that would provide a reasonable measure of this ability in the local 
culture; but to answer the many strictly mechanical questions that 
arise in trying to portray any task so determined in the form of a 
pencil-and-paper item, experimentation was clearly essential. And, 
therefore, the approach that was adopted was to select a wide range 
of standard test items and to try, as a first step, to convert these 
into locally suitable forms through repeated trial and error.^ 

Once guidelines for generally appropriate testing mechanics had 
been developed, it was feasible to use also the other approach of work- 
ing directly from the test rationale rather than from a set of actual 
items. The Mechanical Information Test described earlier was one 
of the instruments constructed In this manner. Questions appropriate 
for this test were deduced from the rationale, applied to local environ- 
mental conditions, and then converted to pencil-and-paper items in 
accordance with the general guidelines developed in the earlier experi- 
mental research. 

Thus, as one practical result of the AID/AIR studies, the 
development of a new test rationale or the application of a standard 
rationale in a developing country Is today as feasible an approach as 
the traditional method of adapting the actual items used in other loca- 
tions. By applying the procedural guidelines described in the following 
chapters, the test constructor may apply any of these three approaches, 
in accordance with the pros and cons developed above. 


SUMMARY OF MAJOR SUGGESTIONS 

The major points that have been made about the preparation of 
test specifications for a given practical application may be summarized 
as follows: 

1. Test construction should begin with the development (or 
adoption) of an explicit rationale that links the test items with the 
actual performance to be predicted. 

2. The first part of this rationale should consist of a detailed 
description of the performance to be predicted, written In behavioral 
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terms and focused on the activities that are most critical In terms 
of the performance standards expected, 

3. The second part of the rationale should specify the abilities 
that the successful performance of these critical activities is thought 
to require. The definition of these abiUUes should be sufficiently 
specific to the performance that has been described to permit accurate 
prediction, but not so specilic as to preclude the use of the resulting 
test for other similar application. 

4. The third and final part of the rationale should consist of 
detailed specifications for the Individual test items to be developed. 
These specifications must be consistent both with the abilities defined 
as important and with the examinees' background characteristics. 

5. New rationales should be developed only when those available 
have not provided acceptable results, or when the performance expec- 
ted differs significantly from that normally required in this type of 
position. In all other Instances, including applications to new cultural 
settings, established rationales should be adopted. 

6. When an established rationale is adopted for use in a develop* 
ing country, the description and analysis sections may be retained, 

but the item specifications must be revised in accordance with local 
conditions. This can be done either by deducing new specifications 
from the initial parts of the rationale, or by evaluating the original 
test items experimentally, to determine the specific changes required. 

7. to the preparation of item specifications, either for an 
established or for a new rationale, the guidelines of Chapters 4 and 5 
should be applied. These have proved useful for a wide range of tests, 
and will usually avoid confounding the evaluation of the substantive 
merits of the raUonale with the strictly mechanical deficiencies that 
can be introduced in the overall design of the test or testing procedure. 


Ila, Jr ^ descripUoa |3 Blven In J.C. Flanagan "The 

“L in Test Development," EdncaUonal 

^ ^ f Measurement . XI, I (Spring, 1951), 151-5. Flana^ 
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2. This approach, commonly termed the "Critical Incident 
Technique," was also developed by Flanagan; and has since been applie4 
in more than four hundred job analysis studies. A description of the 
basic steps to be followed may be found in J. C. Flanagan, "The 
Critical Incident Technique," Psychological Bulletin. U, 4 (1954), 327- 
58. 


3. Factor analysis is a method of analyzing a set of Intertest 
correlations to determine the number of independent variables that 
seem to be affecting the scores. Each of these variables is called a 
"factor," and by examining the characteristics of the items that led 
to its identification it may be given a "name." Thus, a variable based 
on the Interrelationship of a group of items that are all measures of 
language skills may be called a "verbal factor"; one based on items 
that all measure visual form relationships may be called a "spatial 
factor"; etc. Having identified these factors, the test coastrnctor 
may then try to devise a new test that measures only the verbal factor, 
and one that measures only the spatial factor, and so on, to generate 
a smaller number of more independent tests than were in the set with 
which he began. For a detailed mathematical discussion, such texts 
as H. Harman, Modern Factor Analysis (2nd ed., rev.; University of 
Chicago Press, 1967) may be consulted. 

4 At the time the AID/ AIK research began, a national census 
of the aptitudes of American high school students, termed Project 
TALENT, was being begun; we selected items for experimental tryout 
largely on the basis of the skills that had been identified as most 
appropriate for this census by the large panel of experts who served 
as Project TALENT consultants. Additional samples of items were 
drawn from the Flanagan Aptitude Classification Tests, the General 
Aptitude Test Battery, and other widely used series of ability tests. 




CHAPTER 



THE DESIGN 
OF 

SUITABLE 

TESTS 


This and the following chapter summarize the specific guidelines 
for test construction that were developed as a result of the AID /AIR 
studies. This chapter focuses on the design of the test paper itself; 
Chapter 5, on the administrative and mechanical procedures for con- 
ducting the testing session. In both of these components of the testing 
process a number of features different from standard testing practices 
had to be introduced. 


CONTENT OF THE TEST ITEMS 

Three basic requirements relevant to the content of the individual 
test items were Developed in the preceding discussions. The first was 
that the specific skill required to solve the problem is one that the 
examinees have already had an opportunity to acouire. This require- 
ment is, of course, fundamental. The second was that all of the infor- 
mation that the item is supposed to communicate to the examinee about 
the nature of the problem and the desired solution— i.e., about the 
"givens" with which he is to begin— is presented in a form that he will 
in fact be able to interpret correctly. And the third was that neither 
the interpretation of this information nor the solution of the problem 
requires additional skills that are irrelevant to the intent of the items, 
but that have nevertheless crept into the test and distort the examinees' 
scores. When test items designed for American or Eoropeon exam- 
inees arc used without modification to test examinees in a developing 
country, one or more of these requirements will generally not be 
fulfilled. 

One common reason for this breakdown is that an examinee who 
grows up in a developing country has not acquired certain of the 
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specific items of knowledge that can be taken for granted in the Amer- 
ican culture, and is unable to cope with problems in which such prior 
knowledge has been assumed by the test constructor. A second and 
even more common reason is that the examinee in the developing 
country is less proficient in extracting the "givens" of the test problem 
from the particular words or drawings that the test constructor has 
used as the medium of communication, and is unable to solve problems 
he actually has the ability to solve simply because of the way in which 
they are presented. In adapting a standard test or in preparing a new 
test directly from the specifications of the test rationale, the first 
step is to ensure that the knowledge, language, and perceptual skills 
the items require are consistent with the background and experiences 
of local examinees. 


Knowledge Factors 
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The residue of what is left of a t without special study, 

elements have ien eUm^in ““op ‘•'o inappropriaL 

a reUabte ahllity measure A ^ adequate basis lor 

ing suitable test content usuallv procedure for generat- 

plnt is that the knowledge elements ‘*o''o'°P'‘‘- '^*'0 second related 
inappropriate in this cuUurJ sIS^e. ' „ ‘ost that in fact are 

Some elements, though stranee tn th ^o ^t all "obvious." 

affect his performance at all beSa^e ““T •" actuality not 

Mem he is to solve; and chanami f “"iPal to the pro- 

^cause they are strange mav h?, '"“•y-Mlly simply 

•" loot hovLpmenUs aO-lmSr" 
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With respect to the first point of selecting content that will lead 
to reliable measures, the key limitation is that direct observation of 
the local scene seldom suffices. If the test constructor relies solely 
on the observations of the available information sources from which 
the examinees could learn, he will invariably include some that have 
in reality had no impact on the examinees whatsoever, and miss many 
other less obvious sources that have had a substantially greater effect. 
In the early African studies, it was somewhat surprising to find that 
the typical Nigerian student in those days knew much less about the 
trees or flowers that grew all arotmd him than did the American staff 
members* wives, and less about the history of his country in precolo- 
nial days than the staff members themselves had learned in their orien- 
tation courses. As a function of his inherited educational system, his 
knowledge of European Bora and history was in actuality considerably 
higher. With the growing spread of transistor radios and television, 
the knowledge of youth anywhere in the world is almost certain today 
to hold many even stranger surprises. 

To identify the categories of information suitable for use in a 
test more systematically, one of the following three approaches can 
generally be applied: 

1. The first and safest is to compile examples of materials 
the examinees themselves have produced that indicate the kinds of 
knowledge they actively use in aontest situations. This is the approach 
that was used to determine knowledge of words as the basis for design- 
ing the Verbal Analogies Test, as described in Chapter 1. Only when 
examinee-produced materials relevant to the test (i.e., themes in the 
Analogies example) are not readily available should less positive in- 
dications of knowledge be used. 

2. Second best is to examine the content of relevant information 
sources that the examinees have been actively encouraged or instructed 
to_use. Textbooks are one such source (applied in the I-D Reading 
Comprehension Tests); the daily newspapers (used for the I-D World 
Information Test) are another,* Such sources offer fewer guarantees 
than those that show the actual acquisition of certain ideas or knowl- 
edge, but do generally result in items that require only limited post- 
tryout revisions. 


•All tests cited In this chapter are discussed fully in Chapter 
6, where the developmental procedure that was followed is also 
described. 
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3. Least satisfying, from the point of view of the amount of 
research required, is to identify stimuli from which the examinees 
might have learned ( though nothing was done to encourage such learn- 
ing) , and then to determine empirically to which of these they have 
responded. This approach will usually require many sequences of 
tryout and revision, but for some tests (such as the I-D Mechanical 
Information Test) other good options do not exist. Whichever technique 
is applied, the important principle is to use an empirical approach in 
generating the knowledge elements suitable for a given ability test in 
a given cultural setting. 


The second related point applies to those cases in which the 
task is not to generate new items, but to adapt those of an existing 
version. Here the important first step is to ensure that solving the 
test problems really does require adequate knowledge about a certain 
"unfamiliar" component before work to adapt that component is un- 
dertaken. The I-D Checking Test affords a convenient illustration of 
one common situation In which this issue of "to adapt or not to adapt" 
lo^cally arises. The task in this test is to inspect the five objects 
Hvp ^ and to pick out the one that Is defec- 

sLle might be shown, tor example, one of which has a 

o^nm S The question that had to be asked was whether 

Me mot lam m 

sdsithnn “= I” ‘I'® Alrican setting. ArgninE against 

more quickly Ta re«inivo n,- fberefore pick out the "defects" 

q aiy. To resolve this issue, empirical data were needed. 

and uidrSutrtwectst‘^^J).'a?*‘‘'-“‘,'° ‘^“"‘t'lned both tamliiar 

grade students, aid the resuTm'^^^''’* *° =='"^''=1 classes ol sixth- 
lindingwas that there were no ‘1cm. The 

whatever, items based on esoteric di^tierences in performance 
as those based on objects from .u ‘*1® same properties 

the drawings was not an impott ‘'nowledge of 

ior this particular test, andtls had =®*ccting content 

advantage-ln light ot the 7 ‘”‘”■•‘=‘"1 Practical 

permitting a selection based mainlv on th^ Precision prinllng-of 
duction. To the casual observe*. Jo ^ mechanical repro- 

» y items in this test may seem 
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obviously inappropriate to the African culture, but they are neverthe- 
less fully effective. 

A second and somewhat different concern about the use of non- 
indigenous test content has been that even if the examinees have the 
knowledge to solve the problems, they will recognize the test as being 
a "foreign” one intended for other countries and not put forth the 
effort required. Because of the apparent difficulty or unfairness or 
irrelevance of the test, their motivation might suffer. And so the test 
constructor must ask whether a sketch of a house that has a roof 
different from local practice in house construction should be redrawn 
even though the examinees will almost certainly recognize the house 
as being a house from the original drawing; or whether an English 
passage that spells "colour” as "color” should be reprinted, even 
though the examinees should have no problems at all in reading the 
passage correctly. Do cultural anomalies that are incidental to the 
solution of the problem itself affect the examinees' performance ad- 
versely? 

In the AID/AJR studies, no adverse effects of this type were 
noted. The results on the original and the modified versions typically 
were the same. But because most teachers and administrators are 
understandably reluctant to institutionalize materials that someone 
sooner or later will question, the following compromise approach was 
adopted: 

J. Throughout the feasibility and early developmental studies, 
the original version Is used, without any of these "cosmetic" adapta- 
tions (e.g., the African version is used in Thailand). So long as there 
are no substantive changes that have to be made before the test can 
even be tried in a country, cosmetic change is deferred. 

2. Then, at the point at which operational test forms are to be 
printed, the cosmetic changes are made as part of the preparation of 
copy for printing. Since a number of substantive changes normally 
have to be mane at this stage in any event, little additional cost is 
incurred. 

Reprinting a test for the sake of appearances before its basic utility 
has been verified through tryout studies may well entail needless 
expense. 
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Language Factors 

In some of the developing countries the language component of 
test construction poses few if any unusual problems. The local 
language is used, the standard techniques are applied, and a culturally 
appropriate test sooner or later emerges. Methodologically, the 
process in an advanced country like England and in a developing country 
such as Korea can and should be the same. 


But this ready generalizability is not universal. In many coun- 
tries, such a Nigeria, Mali, or Bolivia, the English, French, or Spanish 
that must be used for nation-wide testing is not the first language of 
the examinees; nor is it the language in which they have done much of 
their thinking and learning. The task here is to test youngsters in a 
language that has served more as a subject to study in school than as 
a tool for normal communication; this is so radically different from 
the standard test situation that much of the standard approach to 
language In testing must be adapted. 


This chapter describes the adaptation of the content of the test 
second language examinee groups, first for tests of verbal 
aptitude measures. The many 
‘I'® administration of the 
test will be taken up in the following chapter. 

Ibe Language Content of Verbal 
Ability Measures “ 

sioni (reasonii^, comprehension, and expres- 

melrL Thtl most widely osed of all altitude 

learning potenuL and ! individual' s overaU 

siirh o ^ and for situations that require learnine abiUtv 

to aU otne?St^^^^^^ consistenUy proved superior 

performance. In the 

include verbal antiturfo » testing program that does not 

verbal aptitude tests would be an unthinkable proposiUon. 

attained a ce'rlata'levd n[ groups who have not 

result in a greatly distnrtPri proficiency, tests of this type will 

alderable effort has ^nlevS^ t A”" " — 

will measure the eenerai mo..f t eonstruction of tests that 

use ot any language content what an individual without the 

abstractsha^, to^Tnl “ --sPlseement forwards, 

spaual configurations have 

been applied, and a wide variety of enUrely 
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nonverbal tests, including aU of the so-called culture-fair measures, 
has by now been developed. 

Thus, the first and most basic decision the test constructor in 
a multilingual country must make about general ability measures is 
which of these two different routes to follow. Should he begin with the 
raUonale of the verbal aptitude test and try to devise language items 
that will be effective despite the verbal handicaps of the groups with 
which he is working? Or should he select the nonverbal approach 
and develop abstract and spatial items? Or must he invest in both 
and defer the decision until comparative data for this particular group 
have been assembled? At the Ume the African studies began, these 
issues were being widely debated. 


At the extremes of the educational spectrum, there was no dis- 
agreement, of course. For students at or above the secondary school 
level, verbal tests were the indicated procedure; for illiterates, the 
nonverbal approach was the only recourse. The quesUon centered on 
the "gray zone" of examinees who had perhaps four to six years of 
primary school education, and whose language skiUs (in EngUsh or 
French) were far below those of their counterparts in the industolalized 
countries. Because this group included the examinees at the 
secondary school entrance level, the resolution of these Issues was 
deemed highly important. 


The design of the African research did not permit the con^oUed 
experimentation that would firmly and finally lay s 
But, lor a variety of logical and empirical 

concluded that the initial investment for parti y e “ . 

should always be in verbal tests, and that in f 

effective verbal testa can in fact be constructe . oencil- 

approach to "intelligence' -testing, at least InTKe context of pencil- 
and-paper tests, is likely to be much less effecuve. 


The basic reason for this probably "'“^"imaUon'rJ^ge 
of the two approaches. In virtually every learning simaUon la^g 

is the essenUal medium whereby new wiU^^'r be 

verbal test replicates this process more cl y skills 

possible with a nonverbal procedure. Evenw critical 

of the examinees are rudimentary, a closer m o-hieved. Verbal 
requirements of the criterion me examinees have 

items also are grounded more firmly in , , u- that com- 

practiced before than are the essentially inir to 

prise most of the nonverbal approaclws. activities to every 

cultures by being equally Irrelevant tome “Se 

culture Is not likely to yield a meaningful ablll y 
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A second, more practical reason for choslng verbal tests is 
that the substitution ol pictures or symbols for words Ircqucnlly In- 
creases rather than cases the problems of test adaptation. None of 
the abstract reasoning tests tried in Africa survived even the early 
trials, and the one nonverbal test that eventually was dcvclo{)ed (i.c., 
the Similarities Test) has been only partly effective. Probably as a 
function of the education methods and materials to which Uicsc exam- 
inees had been exposed, words were a generally more effective 
medium for testing than were symbols or pictures. 

And a third reason is that the adapted verbal ability tests did 
turn out to be highly accurate selection procedures. Although the 
development costs were in each case substantial, the cost-effective- 
ness of the Verbal Analogies and Reading Comprehension tests arc 
perhaps the highest of the I-D lest scries. 


The nature of the adaptations required for examinees with limited 
l^guage skills is best illustrated by the Reading Comprehension Test, 
described more fully in Chapter 6. The first problem was to find para- 
fk* ul* simple that most of the examinees can understand 

the basic InfornuUon presented, but that despite this simplicity permit 
some degree of inference and Interpretation, so that 'Verbal reasoning" 
constructed. The second problem was to determine 
I. fh. f »< '»cl«ch Buellsh that are normally used 
, ‘"'f questions wllh nauvo speakers ot 

V^ear. I ' ^ '’““‘‘"CCS who have had only a low 

to a J “Wch ol these duos wore too subtle 

reotorrf una' , " m l"""" analyses was 

this M t'Jt ” ' Iclcranco Umlts (or 

“iSTn a 1, , ' 'I™" ** ““PUo-hlly totrrow, In that a 

is (ully eKecUve.' “Ch l“lo one 


almosfcLSrte Ptniniages and other countries, It will 

procedure Because th ^ repeat this entire developmental 

area ^re fmTd to *" ““ 

languages used (rpno-d were native speakers of the 

developed. And, ^Indeed Som'I' “ dddtd-language could not be 
of these types of tests mau ■cultural principles for the constructloi 
hopefullyVeneriSaL one Important and 

ability tests is that thev can AID /AIR research on verbal 

qeaufled SroupfaL with marglnaUy 

much broader range of situai- P^^°us should be attempted In a 
ange of situations than had been suspected. 
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The Language Content of Other 
Ability Measures 


The other types of ability tests that have a significant language 
component are those in which the nature of the operation the examinee 
is to perform changes from item to item, so that he has to be told as 
part of each item what it is that he is to do. In the I-D Mechamcal 
Information Test, for example, the 56 problems are in effect 55 
separate tests, and as the examinee reaches each one, he must be 
asked the appropriate question. A sizable language componen , en 
compassing at least 56 separate sentences, has to be in ro uce 


The danger is that so large a verbal component will dilute the 
accuracy of the test by turning it into a measure of lanpiage skills 
as well as of the strictly mechanical aptitude that was in ® ’ 

examinees who have somewhat greater or lesser s ^ tP^scores 
comprehension might earn inappropriately higher or ° when the 
and so might those who read a UtUe faster or 
testing is being conducted in the first language o » 

suitable precautions have to be taken. 

The normal precauUons are to pretest the ^ 

point of view of examinee understanding, and to rep --—prehension, 
them as necessary to achieve uniformly high ^ . 3 ^^ 

And, in the advanced countries, this can almost ^ 
when a second language has to be used with exa n gnnuch The 
a primary school education, such editing typic y sizable 

individual differences in language skills *_ j^easure. 

verbal factor continues to confound the ability the tes 

The I-D soluUon to to 

from printed to oral questions , so that skill .,,a_jation) are 

been found to be the largest source of „bieras represented 

not Involved. The test paper “"““y /^Mt^ appropriate sketches; 
in the case of the Mechanical Information ^ <»vaminer who reads 
and toe associated guesUons are ^ quUuons to 

them out loud. The normal precaution of p j .-—ain hlehly impor- 
sUU necessary because the specific words used rjmMn hW^^^ 
tant; but it is the basic change from a wr --—prehension required, 
that is the key to obtaining the high levels of comprehension req 

As a by-product of this technique, later be 

for explaining the sketches and for pac ng testing approaches; 
noted. It was one of the most useful of the I-D tesung pp 
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a»d, though designed mainly for use to conto^ may 

in fart have eeneral utility in most cultorai settings. In Thailand, 

idmlnistraUons of the Mechanical Information Test with 
than oral questions signlficanUy increased its correlation wito 
academic performance. suggcsUng an unduly high verbal '“dtog, even 
thou^ the examinees' first lai^ge was the medium used. And it 
may well be that an oral approach can profitably be applied in ^ coun- 
tries to reduce the verbal component of a variety of specific ability 
measures. 


This approach is by no means a panacea, however. It cannot be 
applied to complex verbal problems, such as those of a mathematical 
reasoning test, because the examinees cannot absorb the pertinent 
Information from an oral presentation alone. To reduce the huge verbal 
l o a d ing of this type of test, an entirely new kind of item that does not 
rely on "word problems" at all probably must be developed. But the 
attempts that were made to concoct such items have so far been un- 
successful. 


Perceptual Factors 

If one compares any tangible object with a faithful sketch of the 
same object on paper, he will be able to find dozens of specific charac- 
teristics that are far from the same. Certain aspects oi the object 
have been Ignored, others have been distorted, and still others have 
been represented by special conventions that Uie artist has used as 
a sort of a code. And yet, most American youngsters could glance 
at the sketch, and immediately name the object portrayed. 

The reason for this, of course, is that objects are sketched in 
accordance with certain set rules that the American youngster has 
learned to interpret. The thousands of visual representations to 
which he has been exposed have so conditioned his responses that he 
not only looks at the sketch and "sees" the object, but is actually 
hard-pressed to identify the discrepancies between the two without 
careful concentration. 


win have not had Ibis specialized training, however, 

111 generally not have developed these special responses, and will 
ee U a sketeh what Is actually there rather Uanwhatls Intended. 

*“1 looii tot; a mouse to toe toregroond of a drawtog wlU 
f M ™ “>= hortoou; a colunm of emoke wSU 

SeL ’'‘=“1 representotlons ure rare, 
toese ktods of 'more natural" respunses are the more likely, and 
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appropriate adaptations of every pictorial test item will probably 
have to be made. 

The nature of these adaptations varies from test to test in 
accordance with the kind of perceptual skill the items req re. 
approaches are most useful for adapting test items in which tee exam- 
inee has to recognize the object portrayed in a drawing in or 
answer the question, certain others for adapting items that are based 
on abstract configurations. 

Drawings and Sketches 

In some tests, such as the Checking Test 
example, it will be found that the examinee can f 

recognizing the objects that are represented, an ^ tests— such 

perceptual content is not required. But in numero , , jj ..rjes— 

as the SimllariUes or Mechanical P®h,°^®.ti.ayed 

the items are based on the characteristics ^ 

and tee correct interpretation of each sketch is j 

In the early African studies, the fact that many ^dTr^ 

recognize even simple objects from the pncnuntered 

aptitude tests was one of the first major hurdle 

One possible approach to this problem '!^®^J;p^g*J*(fQrrecSy, 

core set of drawings that nearly everyone cou . , And to 

and teat could then be used and reused for 
this end, a large set of trial sketches was 

at a time to two classes of f ^as interviewed individ- 

identify the objects portrayed. Each s laneuage to name the 

uaUy, and could use either percenUge of students 

object or to describe Us function. Then, pe index of the 
who identified each object correctly was 
relative suitability of these drawings for us 

As a result ol this study, some the Ippi'ent cul- 

lound, lirst-as one so many other nredictlng the inter- 

tural relevance ol an object is , 3 recognized the sketch 

pretabiUty ol a sketch. All recognized the banana, 

of the airplane, for example, but very distinctive character- 

probably because of the former s ““ obiects which are not 
istics. And it was found also can sometimes be 

recognized from one type of visual r p . as replacing 

made recognizable by complex profile drawing, 

a head-on view of an elegant ^th j d drawings probably 

With care, a small set of generaUy recognize 
could be assembled. 
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But the findings showed also that many of objects that would 
be most appropriate for use in an aptitude test could not be made 
recognizable by stylistic changes alone. Each time an object that is 
not normally seen in isolation was shown^such as one eye, for ex- 
ample-very few of the examinees were able to identify it correctly. 

And the same was true of all of the sketches that had necessarily to 
make use of special conventions, such as drawings of the sun, lightning, 
or fire. None of these very common and therefore very desirable 
phenomena could be used In a straightforward pictorial test. 


The solution that was eventually developed was based on the 
fact that In both of the 1-D pictorial teats the questions are presented 
orally by the examiner, for the reasons noted in the discussion of 
language factors above. It was entirely feasible, therefore, to have 
him name the objects shown in each item as part of the oral question; 
and this method of visual-plus-verbal presentation was found to be 
fully effective. It solved the problems of object recognition not only 
for items that are based on discrete objects, such as those of the 
Similarities Test, but also for Items that show more complex scenes 
or actions, as a number of the Mechanical Information questions 
require. 


In some of the developing countries these difficulties are less 
acute, and M,mlng the objects shown in simple sketches is not esaen-^ 
tlal. But when a test is to be presented orally in any event, this 
technique may still be a worthwhile addition, as an economical fail- 
safe procedure. 


Size, Shape, and Spatial Orientation 


In a number of perceptual aptitude tests, the examinee has to 
perform operations that depend on the dimensions or attitudes of 
abstract configurations. The essential concept is that two or more 
Shapes m^l be absolutely identical with respect to these character- 
r problem that arises at many locations is that the con- 
pts of geometric identity or congruence or not adequately developed. 


■.hlth arose In the Figures Test, in 

"hidin' “““ *'■“* “Shapes is 

examinees In drawings. For 

las™ and il'Il!,! ? “>™>riea. this is a relatively simple 

could not solve standardrmi o" MlCaral.“’l‘„rm"ItT^o:r 
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The reason lor this is that, strictly speaking, there is no 
possible way of faithfully reproducing a three-dimensional image on ' 
a piece of paper. Depth as it actually looks in the real world cannot 
be shown. What are commonly called three-dimensional drawings 
are in fact two-dimensional representations that suggest the third 
through a special convention that artist and viewer have agreed to 
use as a code. A viewer who has not learned this convention will see 
the image as it actually Is—l.e., perfectly flat. 


This raises two types of problems for the test constructor who 
is trying to develop a comprehensive set of aptitude measures. The 
first is that certain kinds of test questions, such as those of a mechan- 
ical aptitude lest, are normally communicated to the examinees by 
three-dimensional sketches; and If he is denied perspective drawings 
as a medium of communication, these kinds of items cannot be used. 
The second is that the skill of three-dimensional visualization is itself 
a highly effective predictor of success In most technical occupations, 
if he cannot obtain an index of this skill in the abstract there will 
be a serious gap in the comprehensiveness of the measures. 


^ problem of using perspective drawings 

J aWUty factors, lltUe if anything can 
.u constructor cannot devise an alternaUve way of 

have to be dropped from 

^ ’I ? r“’° decided to measure In 

T'dl could not be Included simply because 
And such nract7*^i communicating them to the eicamlnees. 

t™t '■''ccrlde the la- 
test Item. ^ evance or expected predictive power ol a proposed 
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a relatively simple version of an important skill is adequate for 
purposes of the test, but will not prepare him for the more complex 
versions that may be required in the course for which he is being 
selected. He could easily earn a top score on the Boxes Test and yet 
be completely baffled by a normal textbook illustration. As sugges 
in an earlier chapter, the need for test adaptation usually is sympto- 
matic of a need for comparable revisions in the standard tra ng 
procedures. 


TORMAT OF THE TEST PAPER 

The single most important difference between examinees in the 
advanced and the developing countries lies not in their re a ve 
sophistication about the specific content of any one ap u e , 
in the cases described above, but in their relative fam y 
the testing ritual as a whole. The ease of *pctman- 

United States tends to obscure how complex a skUl e standard 

ship has become and how much hi^ly specialized tra m | j 

process of testing requires. In Africa it “nhaSnsto 

just following the mechanical Instructions can be mo actual items, 
an inexperienced examinee than the problems pose 

Thus, teaching the examinees the Kuchtea^cW^^ takes 

essential part of the testing procedure. And since s 
time, it is clearly desirable to simplify these mec a Opera- 

possible in the design of the test' s physical 'volSd. 

tions extraneous to the effective use of the test sho 

The problem lies in deciding which “‘^"aXce"s^s 

Which are essential. For most of the steps of the ^ this 

were introduced as a means of reducing the costs ® between 
cost factor cannot be ignored. Appropriate compro 
simplicity and efficiency must be developed. 

The format of the I-D test papers f”he“ were 

seemed, on balance, to be the most cost-effecuv • education and 
duced specifically for examinees who have had m ...mber of them 
little or no prior experience with aptitude tests, educa- 

may well have more general utility In all countries and at au 
tional levels. 


The Use of Rexisable Booklets 


A number of the standard 
experimentally at the beginning 


:rrrVsea?crJ^e%7hnsr.n^^ 
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a relatively simple version of an important skill is adequate for 
purposes of the test, but will not prepare him for the more complex 
versions that may be required in the course for which he is being 
selected. He could easily earn a top score on the Boxes Test and yet 
be completely baffled by a normal texU>ook illustration. As suggested 
in an earlier chapter, the need for test adaptation usually is sympto- 
matic of a need for comparable revisions in the standard training 
procedures. 


FORMAT OF THE TEST PAPER 

The single most important difference between examinees in the 
advanced and the developing countries lies not in their relative 
sophistication about the specific content of any one aptitude test, as 
in the cases described above, but in their relative familiarity with 
the testing ritual as a whole. The ease of administering tests in the 
United States tends to obscure how complex a skill effective testman- 
ship has become and how much hi^ly specialized training the standard 
process of testing requires. In Africa it quickly became apparent that 
just following the mechanical instructions can be more challenging to 
an inexperienced examinee than the problems posed in the actual items. 

Thus, teaching the examinees the mechanics of each test is an 
essential part of the testing procedure. And since such teaching takes 
time, it is clearly desirable to simplify these mechanics as much as 
possible in the design of the test’s physical characteristics. Opera- 
tions extraneous to the effective use of the test should be avoided. 

The problem lies in deciding which steps are extraneous and 
which are essential. For most of the steps of the standard process 
were introduced as a means of reducing the costs of testing, and this 
cost factor cannot be ignored. Appropriate compromises between 
simplicity and efficiency must be developed. 

The format of the I-D test papers reflects the compromises that 
seemed, on balance, to be the most cost-effective. They were Intro- 
duced specifically for examinees who have had minimal education and 
little or no prior experience with aptitude tests; but a number of them 
may well have more general utility in all countries and at all educa- 
tional levels. 


The Use of Reusable Booklets 

A number of the standard aptitude tests which were tried 
experimentally at the beginning of the research are published In the 
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form of reusable booklets which contain the test questions and the 
alternative answers. The examinee is to read the question, to select 
one of the options, and then to mark his choice on a separate sheet 
that can be scored by machine. This answer sheet is "consumed," 
but the booklet can be used again in subsequent sessions. And, since 
one booklet and one answer sheet will accommodate several tests, 
extremely low material and data processing costs are incurred. 

In Africa this highly efficient approach did not survive even the 
first set of field trials, however. The examinees had great difficulty 
in finding the place at which each test begins, in determining when to 
turn or not turn the page, and in coding the answers from the booklet to 
the separate answer paper. Time limits were nearly impossible to 
enforce; attempts at proctoring were Ineffectual; and so preoccupied 
were examiner and examinee with the mechanics that the substance of 
the test was reduced to an incidental. A tightly structured test-taking 
process that the examiner could control in step-by-step fashion 
clearly was essential. 


To obtain such control, It seemed reasonable to pay any price 
cxcc£t that of giving up the capability for machine scoring. This led 
naturally to the Idea ol prlnUng the test quesUons directly on machine- 
scorable Mswer sheets, using a separate answer sheet lor each of the 
dUIerent tests. Each answer sheet would be distributed at the time 
W examinees were to begin the lest, and collected a specified number 
'•'la process would be repeated tor each of 
administered during the cession. At no time would the 
bmM™. ^ “ ^‘"5la aheet ol paper, and the 

Bhould™ 
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all of the Items (for thi^a h same sheet, so that 
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Printed Instructions 

Another feature of the I-D test format that will be noted in 
Figure 1 is the absence of printed instructions. There are no explana- 
tions of the task, no sample problems, no "Go on to next column" or 
"Stop here" directions. The word that indicates where the examinee 
is to write his identification number is the only printed text on the 
entire page. 

The reason that printed explanations are not used is diat these 
Introduce a sizable verbal ability factor into this type of test, as noted 
in an earlier discussion. All explanations are given orally by the 
examiner, in accordance with the procedures outlined in Chapter S. 
Printed "Stop" or "Go on" directions are not used because these were 
found to be generally ineffective. Instead, the tests are designed so 
that the examinee is to work every problem on the page before him 
without pause in every I-D test; and once the examinees have been 
taught this rule (which is in any event consistent with their natural 
inclinations), the need for special directions in the midst of a test 
never arises. 

The sample and practice problems are printed on a separate 
sheet, distributed and used as part of the pretest explanations. This 
serves mainly as a means of ensuring that all examinees begin the 
test at the same time, which had been found to require an excessive 
number of proctors when the practice problems and test problems 
are printed on the same piece of paper. Similarly, the printing of 
Part I on the front and Part II on the reverse side of the sheet 
simplifies the enforcement of the time limits for these separately 
timed halves, since one proctor can easily monitor a group of 40 
examinees if his task is only to ensure that no one tiirns over the 
paper. 


Marking of the Responses 

Integrating the problems and answer spaces on the same sheet 
eliminates the coding task inherent in reusable test booklets, as 
earlier noted. The alternative answers are printed directly below 
the problem to which they apply, and the common mistake of entering 
the correct answer in the wrong place is thereby avoided. 

In most of the I-D tests, moreover, this task is even simpler 
than in the Figures example. The Items of the Figures Test do 
require the examinee to relate the answer spaces that are lalxslcd 
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only with capital letters to the key at the top of the page, and this 
extra step does require additional explanations. In most of the other 
tests, each of the five answer spaces is printed below the actual 
answer to which it refers {rather than below a capitalTetter), and the 
task is reduced to one of simply underlining the answer. 

Such flexibility in positioning answer spaces virtually anywhere 
on the sheet to fit the format of a particular pictorial, symbolic, or 
verbal test problem was not possible a few years ago. The sensors 
of the mark'senslng machines formerly used would not have been 
able to trace the many different patterns of answer spaces the I-D 
tests require. But with optical scanners (such as the IBM 1230 or 
1231, used in the later African studies) the tolerance limits are 
broad, and each of the multiple-choice tests of the 1-D series could 
be cast into a format that these machines are able to score. The only 
important requirement was that a special type of ink had to be used in 
printing the questions, so that only the marks the examinee makes 
would register on the machine. 


The one desired response that the examinees were not able to 
make correctly was to code their ten-digit identification numbers Into 
the answer space format at the top of the page, which is necessary 
whenever the test is to be scored by machine. One problem was that 
detaUed explanaUons. A 

second was that coding had to be done on both sides of the paper, which 
examiner' s control over the Ume Umits when the exam- 
oerformeH ^ H. Accordingly, this operation was 

through ® ® clerical staff before the papers were fed 

SL the examine^' names were 

preorint the a testing session, the computer was used to 


Uniformity of Separate Tests 

during separate teats 

Uie same standard format is highly desirable to foUow 
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to adopt, but sudden changes in a rule that they have already learned 
are extremely disruptive. 


SUMMARY OF SUGGESTED TEST 
CHARACTERISTICS 

Eight major topics related to the content and format of an 
aptitude test were discussed in this chapter. They may be summarized 
briefly as follows; 


Knowledge Factors 

1. The residue of what is left of a standard aptitude test after 
the culturally inappropriate elements have been eliminated will 
generally not serve as an adequate abili^ measure. New content 
specifically appropriate to this culture should be developed, using one 
of three methods suggested. 

2. Before undertaking such adaptations, however, the inappro- 
priateness of the original content should be verified by empirical 
studies. Considerable savings can be effected by deferring purely 
cosmetic changes until the initial tryouts have been completed. 


Language Factors 

3. As an index of general learning ability, verbal tests are 
generally superior to culture-fair methods, even for second-language 
examinees. The development of such tests may be quite costly, how- 
ever, because of the many highly specific local adaptations that will 
be required. 

4. In tests that use printed language as a vehicle for measuring 
nonverbal skills this Incidental but substantial verbal component may 
significantly distort the scores. Replacing printed instructions with 
oral instructions is frequently an effective and economical solution. 


Perceptual Factors 

5. An oral approach may also be used In pictorial tests when 
the examinees are unable to interpret the drawings. If the test Is 
paced by the examiner In any event, he can simply name each of the 
objects portrayed as an integral part of the Instructions. 
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6. In tests that require perceptual skills unfamiliar to the 
examinees, these skills must be taught as the first step of the testing 
procedure. This can be done by simplifying the task, developing an 
appropriate instructional sequence, and providing the examinees with 
feedback on their performance durii^ a practice session. 


Design of the Test Paper 

7. When the examinees are not highly experienced in takii^ 
tests, the standard format must be revised to reduce the number of 
mechanical operations they will have to master. Printing the questions 
directly on a machine-scorable answer sheet will provide most of these 
necessary mechanical simplifications. 

8. Each of the separate tests of the series should be designed 

to fit (front and back) on a single sheet of paper, so that the examinees 
will have to manipulate no more than one sheet at a time. This can be 
done for virtually any type of apUtude test. 

As Important as the design of the actual test paper is the development 
of effective admlnistraUve procedures. This is the topic of Chapter 5. 



CHAPTER 



THE DESIGN 
OF 

EFFECTIVE 

TESTING 

PROCEDURES 


The instructions that accompany the standard aptitude tests were 
painstakingly developed, and it would be unfair to characterize them 
as anything but detailed and complete. The examiner is told exactly 
what he should do from the time the examinees enter the room; exactly 
what he should say before, during, and after each test; and exactly 
how he should respond to the various mishaps that sometimes arise. 
The examinee is told when, where, and how to enter his name and 
other biographical data; how to work the problems and mark his 
responses; and even how to obtain a maximum score. For most groups, 
the procedure is much more complete than it actually needs to be— a 
comfortable margin of safety has been provided. 

Yet, as was earlier noted, the effectiveness of this process 
depends on two important characteristics of the examinees for which 
the standard tests are intended. The first is the examinees' familiarity 
with the general ritual of modern testing. Except for certain minor 
differences, the test they are taking today is "like all of the others" 
they have been given on so many prior occasions. And the second is 
their facility in understanding and following printed instructions, for 
the power of the printed word to elicit a desired response has been 
the key to the efficiency of packaging that has made the testing of 
many millions of individuals per year routine in the American culture. 

For examinees who have neither of these two characteristics, 
procedures that are still more complete and still more detailed must 
be developed. The first step is to revise the format of the test so that 
it can be administered, as described in the preceding chapter. The 
second is to devise the methods whereby this actually wifi be done In 
a "live” testing session. 
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TEACHING THE TEST OPERATIONS 

Even after an aptitude test has been simplified to its barest 
essentials, it will still require the examinee to perform a fairly large 
number of skilled operations. Some will be substantive, inherent in 
toe process of solving toe problem; some will be strictly mechanical, 
imposed by the logistic demands of group testing and economical 
scormg. Both sets of skills must be taught as an integral part of the 
pretest instructions. 


The I-D teaching procedures rely mainly on demonstration, 
visual aids, and supervised practice sessions. For inexperienced 
exammees with limited language skills, all three proved essential. 


The Demonstration Procedure 
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distinguishing the "sample problems we shall work now" from the 
"practice problems we shall work later," and naming aU of the test's 
unique features, such as "the graph that will help us to find the answer.' 
Before the examinees' attention can be focused on any one part of the 
test, their natural curiosity about its many strange features must be 
dispelled, and even the briefest of orientations will significantly in- 
crease the attentiveness of the group to the subsequent more detailed 
e^lanations. 

In addition to its utility as a general familiarization device, 
this initial orientation also serves the highly important function of 
showing the examinee just what is meant by a "test problem" in this 
particular test. The radical change in the appearance of the problem- 
unit to which he is to respond as he goes from one aptitude test to the 
next is a major source of confusion for an inexperienced examinee. 

In the I-D tests, for example, a "problem" sometimes consists of five 
drawings, sometimes of two small squares, sometimes of a path that 
covers half of the page. If the examinee is not shown explicitly the 
nature of the units into which each of these tests is divided, the 
demonstrations will not be fully effective. In this first step of the 
procedure, defining the problem imits is the key teaching objective. 

Sequence of Instruction 

The sample problems that follow this general orientation should 
be designed to teach the task In logical chunks, beginning with the 
essential concept and then adding the others more slowly. The first 
sample problem is especially important since many examinees will 
"tune out" if it seems too complex or obscure. For tasks that Inherently 
are complex, ways of reducing this complexity for purposes of the 
first sample problem may have to be found. 

One useful technique is to split the central concept Into a number 
of separate components and to (each these piecemeal, using two or 
three separate problems. This was the approach used in the I-D 
Similarities Test, which is intended for semiliterate groups and there- 
fore requires especially simplified explanations. The basic task in 
this test is to discover the characteristic that is common to four of the 
five objects shown In each item, and to indicate which one does not 
belong In the group. The problem encountered was that the examinees, 
having been told to mark the one that is different, would look for the 
unique characteristic of the one exception rather than the common 
characteristic of the similar four, and— because each of the five objects 
invariably had at least one unique characteristic— would become 
hopelessly confused. The difference between the operation required 
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to solve the problem (to find the four) and the operation required to 
mark the answer (to check just one) created a conflict situation that 
the initial set of sample items did not resolve. 

Accordingly, the teaching of the concept was split among several 
sample items. The first sample problem contains four drawings that 
are not only similar but identical, and the concept taught is that "four 
are tte same." Then, the second sample extends this concept to the 
cme t^t IS actuaUy desired by showing four pieces of fruit which, 

? wf* "in one »av the same." Then, more 
ft™ T tatrodnced in the subsequent sample 
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Insofar as possible, the instructional sequence should rely on 
physical demonstrations— i.e., on visible actions of the examiner— 
rather than verbal explanations. If the examinees are able to see what 
they are to do, the instructions will be considerably more effective; 
the test constructor should try to find ways in which even "mental'* 
tasks can be reduced to actions the examinees can observe. 

For some tests, physical demonstrations are the natural mode 
of explanation, and no special efforts are necessary to reduce the 
task to one that can be demonstrated concretely. The I-D Dexterity 
Tests, which are based strictly on the physical action of tracing a 
path with a pencil are one clear-cut example. But for other tests, 
considerable trial and error may be required to develop a demonstra- 
tion procedure from which the examinees will in fact be able to learn. 
And, interestingly, the solution may turn out to be quite complex— as 
in the chalk-talk, solid models, and folding patterns that Boxes requires 
—as in the demonstration based on two fingers that was found most 
effective for Table Reading. 

The descriptions of the tests in Chapter 6 will provide a number 
of additional illustrations of the kinds of physical demonstrations that 
can be developed to teach tasks that require primarily mental operations. 
For inexperienced examinees, the research investment necessary to 
develop such procedures is highly worthwhile. 

Oral Explanations 

The ideal role of the examiner's oral commentary is that of a 
strictly supplementary explanation of procedures taught mainly by 
physical demonstration. But this Ideal is seldom achieved. Certain 
aspects of a mental ability test simply cannot be shown, and the 
teaching burden for these aspects roust be carried by an oral explana- 
tion alone. When this is the case, even more trial-and-error than is 
necessary for the development of the demonstration techniques typi- 
cally will be required. 

One problem is that the use of a single word or phrase in the 
course of the commentary can mean the difference between success 
and failure, especially for second-language examinees. The I-D 
Coding Test is a case in point. As noted In Chapter 6, the Instructions 
for this test were not fully effective until the examiner began to use 
two entirely different words to designate the squares that contain the 
stimulus figures and the (Identical) squares in which the examinees 
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to each testing location, and durable enough to withstand frequent 
transport and use under unfavorable climatic conditions. They must 
be reusable, even though the examiner will write on them as part of 
each demonstration, and they must be fairly cheap to produce. Design- 
ing suitable visual aids is itself a quite challenging task of product 
engineering. 

To provide a high degree of reusability at reasonable cost, 
plastic laminates proved to be by far the most satisfactory approach 
of those that were tried. The mock-ups are printed on regular paper 
and then encased in thin plastic sheets by a process similar to that 
used for such other common laminates as identification cards or credit 
cards in the United States. The resulting visual is light but sturdy, 
and the marks that the examiner makes on its surface during the 
administration of the test can be rubbed off with a cloth after the 
session. In quantity, the cost of production is low, and each visual 
can be used for five years or longer with no sign of wear. 

To display the visuals, use was made of the ’’hook 'n' loop" 
nylon tape that was just then coming into popular use. The display 
board (which doubles as a carrying case for the visual aids) is covered 
with the "loop" tape, and to the back of each visual a few strips of the 
"hook" tape are affixed. When the visual is placed agaisist the board, 
it stays In place until pulled off by force. This had the advantage not 
only of simplicity but also of making it possible to cut some of the 
larger mock-ups into smaller pieces for transportability and then 
easily to reassemble them on the board. 

The use of this tape was first suggested by the requirements of 
the Boxes Test, in which the examiner must be able to remove the 
patterns from the display, fold them, and then replace them as an 
integral part of the demonstration procedure. It continues to sezre 
this purpose, in addition to its more general utility for all other types 
of aptitude tests. 

The visual aids used for the Figures Test are illustrated in 
Figure 2. 


The Supervised Practice Session 

The practice session that follows the demonstration is another 
essential clement of the teaching procedure. It should bo designed to 
serve three major functions. 
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Examiner’s InstmctiTOS tor ttie l-D Figures Test 

materials needed 


For each student: 

One Green practice paper number 10 

One Test Paper 

Two pencils and eraser 

For tbe examiner. 

One Timer wUli bell 

Five Display Sections for the Figures Test 
Five yellow cut»out8 
One Display Board 

Two black or red marking pencils (cblna marker) 


PBOCEDDRE 

1. Set up the display board. 

2. Students turn to the green paper number 10. 

3. Examiner explains the lour sample problems. 

4. Group does the (our practice problems is one minute. 

5. Examiner puts problems S, 6, 7, and 8 on the display board. 

8. Examiner explains and demonsiiates the piaciice problems. 

7. Croup puts away their green books. 

8. Test papers are distributed. 

9. Each examinee writes his number cn the test paper. 

10. Part One is administered la live minutes. 

11. Part Two is administered in five minutes. 

12. Test papers are collected. 


Examiner’s Iftstnictlons 


FIGURES TEST 


L DISPLAY BOARD 

Set up the display board exactly as shown 
at the right. Be sure that the display 
board is placed high enough (or everyone 
to see. 


2. INSTRCCnONS 



Everybody take your green book. (Pause) 

Find paper number 10. It looks like this 
pape r la my hand (Bold up » copy c4 page 
10 (or aU to see). The (root o( /our wSr 
looks like this (Point to visual). 



Examiner’s Instructions 


nCURES TEST 


At the top ot the paper are five pictures. A, B, C, O, and E (Point to 
each). Under the pictures are problems . . . Problem one, Problem 
two, Problem three, and Problem four (Point to each). These are 
pictures (Point) and these axe the problems (Point), 

3. SAMPLE PROBLEMS 

Now, look at problem one. (Pause) Inside problem one . . . some- 
where Inside here (Point) . . . there is bidden . . . there is hidden . . . 
one of these five pictures (Point to the pictures). Only one. One 
of these five pictures here (Point) is hidden inside problem one. 

It is exactly the same size, exactly the same shape, and it has not 
been turned In any way. It looks exactly (he same inside problem 
one (Point), as it looks up here (Point). 

Which picture is hidden inside problem one? Is it A, B, C, D, or 
E? (Get group response.) AU right, let us check, l^k at C. 

(Pick up cut-out and superimpose on C.) It fits exactly Inside 
problem one (Superimpose on problem one). It is exactly the same 
(Keep moving cut-out from one to the other while talking). It is 
exactly the same size, exactly the same shape, it has not been 
turned in any way (Turo figure). Not turned in any way. The answer 
is C. (Put down cut-out.) 




Examiner's Instraetlons 


FIGURES TEST 
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tour. (Mark visil.) Everybody mark D lor number 

Pwt your pencils down! 


■ PRACTICE PROBLEMS 

l"r pco- 




these 


PeucU. ■»' »to"tc. 
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Let us check on the 



answers. 




Examiner's Instructions 


nOURES TEST 


Problems. Which picture Is hidden? (Get group response.) That's 
correct. It is A. (Demonstrate tpiih cut-out.) It is not C. C will 
not tit (Demonstrate). 

Problems. Which picture Is hidden? (Get group response.) That's 
correct, it is C. (Demonstrate with cut-out.) It is not E. E will 
not fit (Demonstrate). 

Problem 7. Which picture Is hidden? (Get group response.) That's 
correct, it is D. (Demonstrate with cut-out.) It is not B (Demon- 
strate). It is not E (Demonstrate). 

Problems. Which picture Is hidden? (Gel group response.) That’s 
correct, it Is A. (Demonstrate with cut-out.) 

Everybody take your green books and put them under your seat, 
(Proctors check.) 


5, TEST PAPERS 

Proctors distribute the correct number of test papers to the first 
examinee in each row. The first examinee walks down the aisle 
and gives one paper to each person in his row. Cheek that papers 
are being passed quickly. 

Does everybody have a test paper that looks like this? (Hold up a 
copy of the Figures Test for the class to see.) 


6. HUMBER 

At the top of your paper Is the word number (Point). Everybody 
find the word number . How take your pencil and write your number 
on the line next to the word number . (Proctors check) Everybody 
write your number. 


7. PART 1 

We will now do Fart 1 of the test. 

It has 20 problems and five min- 
utes of time. If you finish before 
the time is up, put your pencils 
down and wait. You are to do these 
20 problems (I\>inC) and no more. 
Pencils up I Everybody bcgini 
(Start clock.) 

Allow ejacUy Dve minutes. Check 
that no one turns his paper over. 
Pencils upl Ever)body stopi 
(Insure all stop and keep pencils 
up.) Part I is finished. 




•Examiner’s Instructlcgis 


TZST 


B, PARTH 

We will DOT do Part H which has 20 problems cxire. Turn jour 
papers orer to the back. You vUl do these 20 problems and not go 
back to Part I. Ererybodj begin. (Start clock. While examinees 
work, set up risuals lor ttie next test, U anj.) 

Allow exactly five minutes. 

Pencils 1^1 Ererybody stopi 

Help proctors collect papers quickly. 
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The first is to give the proctors an opportunity to check each 
examinee’s performance as he works on the practice items and to 
provide any additional clarification that may be required. Such individ- 
ual checks are necessary in the developing countries because most 
inexperienced examinees will not voluntarily admit that they are con- 
fused, and the standard approach of ending the explanation by simply 
askingfor questions is seldom effective. Looking at an actual sample 
of each examinee’s work— which an experienced proctor can learn to 
do at a glance while "patrolling" the aisles during the practice session— 
is the only realistic way of verifying that the instructions have been 
understood. 

The second is to provide the examinees with feedback on their 
individual efforts in solving the problems, as a basis for more meaning- 
ful follow-up explanations. After the examinees have completed the 
practice exercises, the examiner should go over these items and 
explain the answers intended, so that each examinee can identify (and 
hopefully profit from) the errors that he has made. From a teaching 
point of view, such feedback on problems the examinee has already 
tried to solve on his own is an important adjunct to the initial demons- 
tration, in which the problems are worked jointly by the entire group. 

The third and perhaps most important function of the practice 
session is to teach the examinees the rate of speed at which they must 
work to obtain the best possible score. The timing of the practice 
exercises should be proportional to the time that will be allotted during 
the actual test, so that the examinees can actually experience the 
pace that will be expected. Simply announcing that "you will have five 
minutes for thirty problems" is a quite meaningless form of guidance 
for examinees not accustomed to aptitude tests, and so is the standard 
caution to "work as fast as you can without making mistakes." Here, 
as at all other stages in the instructional sequence, the examinees 
have to be shown by concrete demonstrations. 

To serve these three functions, the practice session has to be 
considerably longer than that typically included in standard aptitude 
tests. One or two practice problems are seldom enough. And, as a 
result of this extended practice session (and of the detailed demonstra- 
tion that precedes It), the I-D approach does require more time than 
the standard testing procedures. But the twenty-four minutes of total 
testing time that the average I-D test requires is thought to be enUrely 
reasonable for measuring novel skills in populations seldom If ever 
tested before. 
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Criteria for Evaluating the 
Teaching Procedure 


A final point on the development of an adequate teaching procedure 
concerns the criteria by which the effectiveness of a trial procedure 
can be assessed. In all of the approaches that have been suggested, 
the test constructor has eventually to put his ideas to the test by trying 
them in a live testing session and Iqr determining the degree to which 
they actu^ly communicated the essential concepts and operations to 
the examinees. The question is: How can he tell? How can he differen- 
tiate the examinee errors that are attributable to a faulty teaching pro- 
cedure from those that reflect deficiencies in the content of the test 
or m Its formt or in any of the other structural features considered 
m Chapter 4? In the standard analyses of the results of a test, the 
c 0 a o these different kinds of shortcomings are intermingled. 
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but provide increasingly sharper results, sr v 
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distribution. When the procedures have not been adequately understood 
by a part of the group, the shape of the distribution will generally have 
two humps rather than the one normally expected, for the scores will 
pile up not only around the mean but also at the chance level at which 
those who did not understand will perform. And, though there are 
other kinds of deficiencies that can lead to the same phenomenon, 
inadequacies in the teaching procedure is by far the most probable 
explanation in the developing countries. 

A more accurate criterion still is an index based on the nature 
of the examinees' answers to the individual test question— i.e., on the 
kinds of mistakes that they made. Two examples of such indexes are 
given in Chapter 6 in the discussions of the Boxes and the Verbal 
Analogies Teats. The former is based on a certain subset of items 
that every examinee who has understood the concept should be able to 
answer correctly, the latter on the one response in each set of five 
answer options that would logically be selected by an individual who 
does not understand the basic idea. These kinds of indexes are useful 
not only during the developmental stages but also in an operational 
testing program, to monitor the quality of administration at the various 
field locations. 


CONTROLLING PERFORMANCE 

Another essential requirement of effective testing is that the 
conditions under which the various examinees take the test be suffi- 
ciently similar to ensure that the relative magm'fudes of their scores 
are in fact a meaningful index of their respective potential. No one 
should enjoy an unfair advantage; no one should be handicapped by an 
unfair constraint. In the jargon of test construction, reasonably 
standardized testing conditions must be maintained. 

The imlque problems of standardization that arise in the develop- 
ing countries lie not in ensuring that the examiner and proctors will 
make the same inputs to all groups at all testing locations. This Is a 
universal problem in all countries, and the same types of precautions 
(i.e., adequate training, detailed manuals, and statistical checks of the 
scores) should be applied. Rather, the unique problems lie In adequately 
"programming" the behavior of the examinees to ensure that all of 
them will abide by the explicit rules and regulations, and that all or 
most of them will use roughly the same test-taking tactics in their 
individual efforts to obtain a maximum score. To attain reasonable 
uniformity in Individual performance, a variety of specific controls 
usually must be applied. 
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The Explicit Test Regulations 


In all ^titude tests, there are certain explicit rules that all of 
the examinees are expected to follow. Some of these are general test- 
taking rules, such as not beginning until the signal is given; not com- 
pleting Part I when Part n is to be started; or not returning to the 
mcompleted Part I problems later, even when there is time to spare. 

ers are specific to certain types of items, such as not erasing and s 
ch^ging responses in the case of the Manual Dexterity Test. All such 
mles m^t be explained as part of the initial instructions, and then 
enforced as strictly as possible throughout the session. 


the ^ enforcing these basic rules derive mainly from 

of course ai<5n ^ Although some deliberate cheating does, 
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deviate from the prescribed regulation. To enforce the rule against 
changing answers on the dexterity tests, the most simple solution is 
to provide the examinees with pens or pencils that have no erasers. 

To prevent them from consulting their practice papers after the actual 
test is begun, these papers can be collected; or, to save time, the 
examinees can simply be asked to sit on these papers— as is actually 
done in the 1-D testing procedures. Many ideas for improved controls 
will occur to the examiner who actively looks for deviations during 
each testing session, and then asks *'How can these be prevented? 


Tactical Variations 


Some of the individual differences in test-taking behavior are 
virtually impossible to control. If an examinee decides to work the 
last problem in the array first or to do the array in horizontal rather 
than vertical order, the chances that this will be discovered aj'jj cor- 
rected in time are essentially nil. Monitoring performance at the 
level of the individual test items is simply not practical in a group 
testing session. 


What the test constructor can and should do, ^ 

ensure that the tactics which are taught as the "normal 
procedure are in fact those that are likely to result in a 
If the tryeuts shew that certain of the examtaees 
and thereby obtained inflated scores, the test shoo takinjr 

to preclude these tactics or to adopt them as the normal ‘“t-taktag 
procedure: lor, even though this will not 

number.of examinees who will deviate from the optimum tactics Is 
likely to be appreciably lower. 

A case in point is the I-D Arithmetic 
simple addition, subtraction, multipUcation, 

The initial version that was prepared presented 

problems in scrambled order, so that the ®core including 

pie of performance on all four operali^ But in the tryouts 

those who completed only a small ° harder problems 

the proctors noted that many examinees sk pp .. rather 

aad worked the problems based oa the " en 

than following the numerical .,r^hlgher scores-thc 

though these examinees did not ? ye to'lry to reduce Ihls 

data on this were not clear-it presenting 

large individual variation by unscrambling more 

the four kinds of problems in four ?, second, multiplication 

"natural" sequence ot addiUon first, subtraction seconu. 
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third, and division last was followed in subsequent versions; and the 
large differences in examinee tactics that were initially noted no 
longer occur. 
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of the "number of items attempted" and of the "number answered 
correctly" are almost always the same. 

The most effective way of encouraging the examinees to work at 
the peak pace that will yield the maximum score on these types ot tests 
is to use a greatly exaggerated pace during the demonstrations. If the 
examiner works the sample exercises at frantic speed, the examinees 
will immediately get the idea—and the spectacle of a harrassed examiner 
trying (unsuccessfully) to beat the clock will help considerably also in 
easing the tensions that typically inhibit all-out performance. From 
the point of view of uniform test-taking behavior, speed tests require 
the least demanding controls. 

The only caution to be observed with these tests is that they 
should not be given as the first test of the session. Most groups need 
the warm-up a more tightly controlled test provides to prepare them 
for high-speed performance. 

Power Tests 

A pure power test is at the other extreme of the speed-accuracy 
continuum in that in this type of test, at least theoretically, time is 
not Important. The differences in the examinees' scores derive 
strictly from the inherent difficulty of the test items; and, even if 
infinite time were permitted, the results would presumably still be 
the same. In practice the term is used more broadly to include all of 
those tests in which the test constructor wants every examinee to 
attempt every item, and has made the time limits unusually generous 
to bring this about. 

From the point of view of the cost of testing, "unusually generous" 
must normally be limited to a maximum of about one minute per item, 
however, and if the test is to serve its p\ixpose, the examinees must 
all work at this pace or faster. The normal precaution Is to tell the 
examinees periodically during the course of the test how many minutes 
remain, to encourage those who have fallen behind to work a little 
more quickly. And, for certain tests, this standard technique is 
adequate also in the developing countries. It is fully effective for the 
Information Tests, probably because of the advanced educational levels 
at which these are given; and at least satisfactory for the lower-level 
Reading and Verbal Analogies Tests, perhaps because the simplicity 
of the language used in these items encourages faster performance. 

But for the more novel types of test items— such as those of an 
abstract reasoning test— these periodic reminders were not sufficient. 
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groups axe in themselves not enough. Because the examinees are 
still learning about the management of time when the test begins, 
many of them will continue to adjust their performance throughout 
the test, and not only interexaminee but also intraexaminee differences 
in tactics may be encountered. 

As a further remedy, therefore, experimental studies were made 
also of the value of statistical corrections that might help to equate the 
performance of the more conservative and the more recWess examinees, 
so that their scores could be compared. Using maximum reliability 
as the criterion, it was found that subtracting the number of mistakes 
from the number of right answers was the formula that consistently 
gave the best results in the African studies. This formula has been 
used since for scoring all time>limlt tests, but its utility has not been 
verified for other locations; this should be done. 


STREAMLINING THE TESTING PROCEDURE 

Once a fully effective testing procedure has been developed, the 
test constructor should continue to look for ways to make it more 
efficient. If he can increase the number of examinees that can be 
tested in each testing session, reduce the amount of time each session 
requires, or make do with fewer administrative assistants, substantial 
savings can be effected. 

The major constraint on the size of the group that can be tested 
is the physical layout of the room in which the tests will be given. In 
a typical rural classroom— crowded and poorly lighted, the examinees 
seated at double or triple desks— a group of 35 to 40 examinees is the 
largest that should be attempted. But in an auditorium with individual 
seats and broad aisles, as many as 150 to 200 can be tested with com- 
parable results. And it is therefore well worth the effort to make 
advance arrangements for the best possible testing site at each location. 
Town halls, cafeterias, churches, and even football tlelds were used 
in the African studies to reduce nation-wide testing programs to 
manageable proportions. 

The most significant savings in time that can be effected, at 
least in the I-D procedures, is in the distribution and collection of 
the many separate sheets of paper that these tests require. Assembling 
the four or five practice papers that will be used into a single booklet 
that the examinees are given at the beginning of the session and do 
not return till the end Is one highly useful procedure. Similarly, the 
answer sheets— especially when preprinted with the examinees' testing 
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3. The explanation of the task itself should be programmed into 
discrete stages, beginning with the central concept and then adding 
the others more slowly. This type of sequential approach should also 
be used in deciding which tests should be given near the beginning and 
end of each session, so that the examinees can learn the complex 
tasks more readily by building on what they learned before. 

4. To the maximum extent practicable, each operation the 
examinees are to perform should be reduced to a physical action that 
the examiner can actually show. Visual aids are especially helpful 
in reducing "mental" tasks to observable operations. 

5. The development of the oral commentary that accompanies 
the demonstration may require extensive trial and error. In certain 
tests, the choice of a single word can be critical to the effectiveness 
of the procedure. 

6. The final step of the instructional sequence should be a 
supervised practice session In which each examinee gets feedback on 
his performance. The time limit for the practice exercises should 
be proportional to that used during the actual test. 

7. To evaluate the effectiveness of the teaching procedure the 
test constructor may rely on the proctors' evaluations, on analyses 

of the distribution of scores, or on speciaRlndexes based on the nature 
of the examinees* mistakes. The last of these will usually provide 
the most accurate indication. 

8. To assist the examinees in understanding and following the 
"rules" of the test, the most effective approach is to design the pro- 
cedure so that it is physically impossible for them to make a mistake. 
Unambiguous and consistent signals and the effective use of proctors 
are also important. 

9. The major source of interexaminee differences in test-taking 
behavior is the variability in individual tactics that time limits permit. 
To obtain comparable performance samples from Ihe more conserva- 
tive and the more reckless examinees, special controls, training, and 
statistical corrections may be required. 

10. The efficiency of the I-D testing procedures— with respect 
to capacity, time, and personnel requirements— can be Increased by 
making appropriate arrangements In advance of the testing session. 

For certain groups, some of the more time-consuming features of 
this approach can be replaced with conventional testing procedures. 
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To illustrate the entire I-D process, the examiner’s Instructions 
for the Figures Test are shown in Figure 2 exactly as they appear in 
the I-D Examiner's Manual . The detailed guidance that these instruc- 
tions provide the examiner, so as to promote uniform testing pro- 
cedures at all locations, should also be noted. 
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Each of the techniques described in the preceding chapter was 
applied in the development o£ one or more of the tests in the I-D 
Aptitude Series. This chapter describes nineteen of these tests, both 
as an illustration of the adaptation techniques and as an introduction 
to the following chapter on practical test applications. 

Each description begins with a summary of the special features 
of the test, as it was originally developed for use in West Africa, and 
then notes the ways in which this original version had to be modified 
for use in other countries. It will be seen that relatively few modifi* 
cations had to be made in either the test content or the testing mechan> 
ics, and that the approach to test adaptation suggested in the preceding 
chapter appears to be safely generalizable to a variety of cultural 
settings. This was the major finding of the "generalizability phase” 
of the AID/AIR research. 


REASONING TESTS 

Five of the l-D tests measure skills In working with concepts 
and logical relations, and are in this way similar to the instruments 
popularly called "Intelligence" tests. They are based on three different 
types of test items, and are Intended for use with examinees at different 
levels of education. 

The Similarities Test is entirely pictorial and is intended for 
use with illiterate or semiliterate groups, or with examinees who 
are functionally illiterate in the second language In which the testing 
program must be conducted. The Verbal Analogies Test (low) and 
the Verbal Analcgies Test (High) use printed words rather than 
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pictures, and the Reading Comprehension Test (Low) and the Reading 
Comprehension Test (High) are based on paragraph-long descriptions 
and explanations. The Low forms of these tests are of approximately 
equal difficulty and are intended for examinees who have the equivalent 
of a primary school education. The High form of the Verbal Analogies 
Test is of intermediate difficulty and is intended for examinees with 
a few years of postprimary educaUon or training. The High form of 
me Reading Comprehension Test is more difficult stiU and extends 
me range to the university level. 


I-D Test 1: Similarities 
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le -irunk- bcca^VSontartir obTecu^' “^nded answer 
objects are open rather than shut. 
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FIGURE 3 

Sample Items of the I«D Similarities Test 



When the examinees reach this item, the examiner says: "lAimber Five; Tln-Trunk> 
Book>Lock>Chest of Drawers. Four of these are in one wa.y the same. Mark the 
ONE that is different." 




- 


\/ 


22 


When the examinees reach this Item, the examiner says: "Number Twenty-two; peg- 
Blcycle-BooNUmbreUa-Spooo. Two of these things are In one way the same. Mark 
the TWO that are the same and be sure to mark TWO." 


But it could also be argued that the right answer is "tin" because all 
of the others are reusable while this one Is not. And in the second 
item, the intended answer of "roof and txnibrella" competes with "dog 
and bicycle," using the concept of mobility rather than the intended 
concept of shelter. 


Whether the Intended concept is sufficiently more logical than 
the others to provide a usable test item is not a rational but an empir- 
ical question. If analysis of the item's difficulty, reliability, and pre- 
dlcUve validity show it to be effective, it can safely be kept; if the 
staUsUcal findings are poor, it most be revised or rejected. The 
opinion of the test constructor is not particularly important. 


Where the test constructor's insights do matter, however, is 
in the preparation of the trial items, since the number of empirical 
tryouts that wiU be required depends mainly on the adequacy of the 
first trial form. Here it is Ukely that knowledge of the examinees' 
first language Is probably the most important single requirement for 
elfecUve test constnictton. For even though this docs appear to be 
a nonverbal test, the relattonsWps that are logical in a given cultural 
setting are closely linked with the local language structure. The 
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concept of "open" in the first sample item is viable in English only 
because English speakers use the same word for tins, trucks, books, 
locks, and chests of drawers. In another language, where different 
words are used to describe the (^enness of different types of objects, 
this concept may not be a logical one at all . 

For this reason, extensive use was made of the technique of 
asking examinees to explain their answers, as was earlier suggested, 
and of repeated item analysis studies. But even this was not sufficient. 
The present version of the Similarities Test is the product of seven 
cycles of item analysis and revision, and still is not so effective as 
the other tests of the I-D series. Tests of this type should be developed 
by local speciaUsts, not by an outsider. 

problem, most of the pro- 

enSerrf f *" Pl^PtBr were also 

soiZi M V Slmilarllles Test illustrates the 

who reads the suggested. The test is paced by the examiner, 

In test-taklne ^ L*™®’ varlaUons 

with iUlterale especially serious problem 

ieXd 5 thre^fn.; portrayed in the item 

the examinees wlimteroret tte*d^ question, to ensure that 

the initial uracUce ilrawlngs correctly. The pacing of 

that each aamlnee has uiiefsto.il Procters to verily 
of the later pracUce iUms la at “orkmg procedure. The pacing 

they wiU be given. The^stanL*fe “““'es with the amount of Ume 
visual aids and separate uracticeff <temonstration procedure, using 

pretest tns, ructions ZZpZaHoZ"'"’ " 

to be necessary loVthftestlne Test but one are thought 

test Is Intended. The exceutim i' Groups at the level for which this 
test Items, which was done to Lf the mtermlngling of two types of 
St the pace at which the examlnffff examinees would work 

e^lnees would have to Itoton to ’“o^ons. Since the 

of answer required, it was ttoufht toat determine the type 

race ahead; this desired resmuf to f they would neither linger nor 
ght well keep pace even without th. f o^octed. But the examinees 

^Iner IdeniUy the live obtocto ‘“"'P'y ^ hear the 

ofadminlsterlng and scorlng'a lest Greater simpUcity 

suggests that this be atteZtod. one type of item 
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Intercoimtry Modifications 

The one change that was made as a matter of course in all of 
the countries outside Africa in which the Simtlarities Test was evalua- 
ted was to replace the drawings that are pecuUar to the African scene. 
Some of these changes were strictly styUstic, such as repUcmg 
typically African roof in the second sample item with the type of roof 
most commonly seen in these other countries, f”"'®, ® 
more substantive, such as replacing the tropical fruits portrayed In 
a number of the African items with fruits better known ^ 

such as Korea. In making these latter changes it w^ necessa^ o 

find substitutes that had all of the ® ° ^ „ 

important to the concept underlying the item but 

ad^tional properties which would introduce a second right answer 

as logical as the one desired. 

Culture-Ued changes in the concepts "f ■" 

to be required, at least to match the ”ed with 

version. In Brazil and Thailand, the "‘Sinai 'o"cef were^^^^^^^^^^ 

^eTarafros^^SttrS^^ 

rrlgCv=^^:nTesT^:r^s?r:r^= 

that had no cultural connotations. 

These findings suggest ttet 

many tryouts that were “ „e„erallzeable across different 

development of the test are reaw ./..f further research would 
cultures. But it is virtually certato tot fu^er rese^rc^^ 

generate concepts best Hems" for each location 

test, and it is possible tot the very 
would indeed vary from country to country. 

Typical Results 

The reuability es.ima.es t„.ca.ly^obU^^^^^^^^^^^^^ 

Thailand, since this intended. For primary school students, 

types of groups for „ U,B best that can be eitpeclcd. 

[VLIi^tog Se ms??y^ to « so additional items would raise these 
figures to more accepuble levels. 
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TABLE 1 


Typical Reliability Estimates o£ the 1-D Similarities Test 


Country 

Group 

Education 



West Africa 

Primary school 
students 

6-7 years 

5,189 

0.56 

Thailand 

Young adults (16- 
22 years) 

4 years or 
less 

150 

0.68 

Brazil 

Primary school 
students 

5 years 

140 

0.51 


^ ‘Since all eaammees try each item, KR-20 coeflicients were 


Thai ^Slt^wtolfre^^aMMU 

was .41 against the criterinn * wal vocational training center, 
where. vaUdiUes l!, nredSin ® ^ s courses. Else- 

other periorm^ce evatoSt?"® ratings, or 

ahO for planning pnrpos“ ^ 

When the examinees are nrimarv ® reasonable projection, 

of school lor five to ten vLrT “ut 

groups at younger ages.'^ ’ results tend to be better than with 

can be admlnitered ^‘“‘'orities Test is that it 

by an untrained examiner ^ f =*“ hgos in any language 

^ least the 10 Percent or’^m. reasonable accuracy 

la^er percentage of the examine accuracy drops sharply when 
revisions are ae«ssarf i“ and father 

ready the test for such appUcations. 

I-D Tests 2 and 3: Verbal Analogies 
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years of formal education are to be tested. The High form 
less often because individuals with nine to ten years o e uc 

"in between" the major selection decisions made in mos co ’ 

above this level the High form of the Reading Compre ension 
has been found more effective. 

General Description 

Each form of the Verbal Analogies Test conUins 
the type shown in Figure 4. There are twenty i 
equal difficulty on each side of the test paper, an „ , ,i -f 

problems are administered as two 

examinees complete the test in “tted Ume^w^^^^^^^ 
and seven-and-a-half minutes per part, r p --..oximately 30 
the High iorms. The toUl adminlstraUon “PP 

minutes, and the test can be scored by machme. 

The Adaptation Procedure 

The major problem ^ ‘ j^to^be'ratremely dilficult 

was that the concept of m as "these two things are 

to explain. All of the initial attemp ^ way*— led 

aUke in a certain way, and these produce a satisfac- 

malnly to mass contusion; the “yAfricmi students, 

tory iplanatlon to use with twelve-year-old African 


Sample Items 


figure 4 

of the I-D verbal Analogies Test 
(Low Form) 
bicycit and **eel 


ypinAnd pound 

clock »nd. ? „ 



ability testing in developing countries 

trvin lhal did prove effective was the simple one of not 

all- The examiner merely points out 
If *a “l“ing in the first deraonstraUon item, and 
and * 1 ?" “amlnees; "Mother and daughter, father 

each time'^mlfir^nv il*"'" Je goes through seven additional samples, 
later he exDlain<i m rh ^ group has responded; 
work’individually on the-r'^^ P^’a^Uae problems that the examinees 
sessioT to ernL ” a ."'"i' At the end of this 

them may be able to ex'nla^n vegulred, even though none of 

test itseU can then be gtoen. analogy concept in the abstract. The 

ness of the insti^tinn^^J^^'^^i°P"'^”^ 1*^® procedure, the effective- 

that the examinees answerM*'^^'”* — "“™l=ara of test problems 
■nlatakes. In the ,ir“tTa^. “"u? 'P® “ature of their 
examinee who does not undersS “ample, an 

‘snore the top line of the nr^hi. analogy concept will tend to 
answer, be“use ““ a® 

boy-ride," or the other 00 =,™! , Pal' Iban "boy-foot," 

ber of examinees who conslstentlv'^h’^*"^**”"*' ®1' eounting the mim- 
Msalble to estimate the l;umbefwH„^fP loBlcal pairs, it was 
The revision and rephrasine of thA f understand the instructions, 
'““‘a eho^d fh^ Jp ‘nstructions was continued until 
had been achieved. ”1 55 percent comprehension 

through a series of straightfor- 

follQ\if*«K*^ Ihere any dUficuUlcs with ih° juiusual problems were noted, 
'ollow the elandard i-d proced ‘'““"‘I "anbanics, which 


problems in Possible to rctaln^in'r .''■ /" ‘■'anslatlng 

‘bn changelrom »“h only su^s ‘b® original 

O' Flfiurt 5 i^.-PP™*' la •'Ulogtam- ““fiticatlons as 

‘bn nr‘ 5 lnaUicm“'J^‘ "'anslaUnn of 1116X^/0°“* ®™P'n “n™ 
i^ided to the tAtif , lo be replaced* anrf it w ^ number of 
lorm as uejn to ^ bi ThalLiort < ^ items were 

The basic rea 

--epts was that ,he°"^^ ^ -bnb of the ortgtna 

““‘1 "n‘ necomodate certain 
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word-pairs that in English are anitable for use in aialogy Thus, 

the item "school and study; office and 7 ,* in which "work is the 
intended answer, would in Thai be written as "building -for-learMg 
and learn; place-for-work and in which the answer would be 

indicated by the word structure. The item "school and principal, 
army and ? " was used instead. But this was the only type of 
modUicati^HTtat was at all attributable to cultural factors. The rest 
were straightforward technical Improvements based on item analysis 
data. 

In Korea it was decided not to attempt to translate the 
version, and a totaUy new 48-item test was developed As a result, the 
generallzability data from the Korean studies 

analogy format is suitable in also this country; they provide no info 
mation about the transferability of the original items. 

prhite?s?s^^i=^5e:s:^S;^ir 

fraLrrt'rsTpCch-SL^^ 

rirb^-s^rriLrs^urr^^^^^^ 

ton Af ricans reach the last grade of P^r^Lslbl’e adv^" 
a more highly select ® J is fi,at since it is used mainly 

giving this test with printed instructions is under- 

for school selection, the examinees ® * uredictive of their success 
standing written explanations may itself be predictive oi m 

in advanced academic courses. 

overall, the Verbal Analogies Test^roved^Mg^^^^^^ 

to at least two of these “‘I’" mst items were equally 

but also the vast ma)ority of tne specuii- 

effective. 

Typical Results 

The rellahtltty estlmams t^icauy^uhtatne^d 

Analogies Test are fd'enUcal at aU locations, 

estimates tor the Low form v somewhat lower. But all 

“ ^^r^s fre^Sequam for operational use. 

Most of the validity r 

Test are based on studen Iheretore are not representative of 
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applied. In these studies, validity coefficients in the range of .40- .45 
typically have been obtained against the criterion of end-of-term 
grades. In a study of 669 Thai seventh-grade students, in which current 
course marks were used as the criterion, the coefficients were in the 
^ge of .60-.65, which are probably higher than would be obtained 
rom 0 w-up studies based on grades in secondary school courses. 

purposes, a figure ol apprOTlmately .50 Is a reasonable 


I-D Tests 4 and 5: Reading Comprehension 

in two^ay^'rhe'^Hr^ Comprehension Test has been used 

Ele?Test for a “ altemahve to the Verbal Analo- 

to evienee desirable 

duct of the meap candidate s reading proficiency as a by -pro- 

bet it Srscretii t ^ 

to reduce the numbers to^-sf^^rt appUcanU, 

a more comprehensive seri«,a^/I UnaUsts who are then given 

three or^our others n?.*? Including Verbal Analogies 
appUcaUons, ^ Proved effective for both of these 


Venn ' 


table 2 

Typical ReUablm, Estimates ol the I-D Verbal 
Aiulogles Test 

' — Eaucatlon Number r*“ 


(40 Items) West Africa Primary 

T Students 

bo» (40 items) Brazil 


6-7 years 


Primary 5 years 
students 

Primary 7 years 
students 

W'«AIrlca Secondary 9-10 
.. Thalund 


boa(48ltcma) Ttailand 
High (40 items) 


374 0.87 

137 0.85 


669 


0.84 
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The High form has had much wider use, serving as an all-purpose 
selection device at the higher educational levels. It has been used for 
general university admissions, for postsecondary technical and com- 
mercial training institutions, and, to a limited extent, for postuniversity 
professional courses. Its range of difficulty appears to be adequate 
to accommodate this variety of practical applications. 

■ General Description 

The format of the Reading Comprehension Test is shown in 
Figure 5. The examinees are to read each paragraph, and to decide 
which of the alternatives listed for each of the blank spaces is the 
word that has been omitted. The Low and the High forms include 40 
and 42 items, respectively, and most of the examinees finish in the 
25 minutes allotted. Because the task is an especially easy one to 
explain, the total administration time seldom exceeds 30 minutes, 
and the one-page paper test can be scored by machine. 

The Adaptation Procedure 

The major problem in the development of the Low form of this 
test was that the standard technique for measuring reasoning ability 


FIGURE 5 

Sample Items of the I-O Reading Comprehension Test 


A boomenng i« • currtd club which bu b««a 
used foi bundle^ of yean in AuiCnli*. AlUiough 
ID the pait it (erved only u a wetpoxt, (h«n ar* 
BOW two kjodi of 2L. . A ‘’TtMii oat, uaed only 
for ipott, will return to the thrower when it '.ZZ 
by a f killed person. The 2^ war boomoang 
doee not ZL when thrown. 


boomwangs types A^slm e 


poMirful cunwi 




24 . ^ 


Bocket and jet moton are very sunilar in that 
both tbe Vehicle* in which they tit .£t by 

expelling parbcles at a very hi^ velocity. Such 
pulicle* roust have not only ^ , they alto 
must have weight to provide the necesauy reac- 
tive force. A torch U^t does expel paiuclct 
at the speed of li^t. but thepar^es^k .22 
to that the reactive force it almott .42. Tbe 
prusaiy difference between rocket and j*t moton 
utbat. wfaereaijett mutt bum oxygen from the 
as, rockets cany their with them to that 
they can fly above the 42. . 


ewfy propel owwiTut 

jy moienad controIM inoneg 

38. 

39, velooty vwgit 

corwant ntg^bM diwelitd 




42 . 


guiOt optrett 
lotdta guidtd 
force weed 
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within the format of a reading exercise— which is to write items that 
require the examinee to draw logical inferences from the information 
^ examinees who have only minimal 

skills in the language in which they are being tested. It was difficult 
to find paragraphs that were simple enough for the examinees to read 
m understand and yet meaty enough to permit reasonable test items 
to be written. And, given suitable paragraphs, it was difficult also 
‘ element o[ reasoning without 

Sucalon students with only a primary school 

that aS'pn^"^^* ® “Idstretes the kinds of issues 

to ul paragraph cou- 
lee tie toltowli: “■= construction objectives 

intact to CMble toe'’eaS'taL‘ sentences 
i^omeraegs that are ddfe'^^ttrbe^i^l'rdTLd 

feature of on™oUhe\ooMM **'* “““*bse is able to infer that any 
must be just the oppo^te of Py s blank 

ppostte or the other boomerang's characteristics. 

retom-TO?reto^,^^“TqueSlo"^^ small-large, and 

paragraph in w“to”iTe‘’itoms.*“'’ 

lumal tatTtoe s“ ruJar tj/S" 

Ws contrast was too dilf l^u “Vhl 
tte rther pairs It was the secuito 

Uuned that should be deleted TjT ^ 'sature men- 

‘•em u?Ftor?r.“^'‘P >s 
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Administratively, the test presented no problems, as the exam- 
inees readily understood what they were to do. Further, the standard 
I-D visual aid was lully effective to explaining the marking procedure. 

Intercountry Modifications 


Because the contextual clues on which these items depend would 
seem to have limited generalizability from one language to another, it 
had been thought that direct translations of this test would probably not 
be effective. But the results of the two translations that were attempted 
turned out to be as good as those obtained from the original version, 
and suggest that this type of test is much less dependent on language 
idiosyncracies than had been suspected. 

The Low form was translated only into Portuguese, and the High 
form only into Korean. No changes were made in either country; and, 
as will be seen presently, the estimates of reJiabiJity and validity that 
were obtained were entirely adequate to warrant the operational use 
of these verbatim translations. 

Typical Hesults 

The estimates of reliability typically obtained for the Reading 
Comprehension Test are summarized to Table 3. They are at approx- 
imately the same level for both forms and across all three locations. 


TABLE 3 

Typical Reliability Estimates of the I-D Reading 
Comprehension Test 


Form 

Country 

Group 



r* 

Low 

West Africa 

Primary students 

6-7 years 

1,572 

0.73 

Low 

Brazil 

Primary students 

5 years 

139 

0.80 

High 

West Africa 

Postsecondary 

11-14 years 

295 

0.74 

High 

Korea 

Telecommo trainees 

Mixed 

IS8 

0.77 


•All are KR-20 coefficients except the Brazil estimate, which 
Is an odd-even coefficient of correlation. 
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In West Africa, the validity of the Low form against the criterion 
of grades in the early years of secondary school was in the range of 
.45-.50 when the test was applied to students already selected. In 
Brazil the validity of the Low form for seventh-grade students against 
end-of-year grades was .55; in general, an esUmate of .50 is a reason- 
able expectation. 


form against grades near the end of 
was slirhtlv !ih° years of postsecondary education 

student! ^ sample of more than 700 West African 

. For a small sample of students in university-level commer- 

ir ‘he^Tcom- 

municauons trainees in Korea, an esUmate of .47 was obtained Vnr 

•=“ Senerai acrm~ses 

and near .40 for advanced speclallaed training can be projected. 


INFORMATION TESTS 
Ihe examlne'e's' Sterest3°ta'and*antInS™r 

of occupations. The ratinnau broad categories 

IndlvldS accVulLtes a “^t eve!y 

ouUide of school as a r!sult of "incidental’ information 

observations, and th« eachVaUe!,^i^‘’i'®’ “'‘i everyday 

■nation about topics that he iTuiVf!? acquire more detailed infor- 
teplce that he dSes n?^‘ By metSa 

ta a particular occupation an informSi^ topics pertinent 

of his inclinations toward this ty^l P^yiiin an index 

b^c Js!! i;° ote''re'fir=“ b-" 

Wormauon test in the I-D sSu! "'b fnwest-level 

o^y six to nine years ol education i"ianded for examinees with 
bevice lor the sUUed trades. ^ mainly as a selection 

“liS!d!^ f Information 

“e s^f*'“''^‘-e"aduale l“ el bbove the 

Other ^ curricula resL^f selection into 

postsecondary lusUtutions. ' of universiUes or 
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I-D Test 7: Mechanical Information 

The I-D Mechanical Information Test has proved to be the most 
consistently effective of the I-D tests for vocational selection at the 
postprimary level. Because of its essential simplicity, however, its 
range of applicability does not extend to examinees who have had more 
than two or three years of postprimary education, and it cannot nor- 
mally be used for selection into advanced technical courses. For 
such applications, a more difficult form of this test should be con- 
structed, using similar developmental procedures. 

General Description 

The Mechanical Information Test consists of pictorial items 
about which the examiner asks questions orally, one at a time, as 
illustrated in Figure 6. There are 56 items, and approximately 35 
minutes are required to administer the test, including the pretest 
instructions. The test Is printed on a single sheet of paper and can 
be scored by machine. 


The Adulation Procedure 


Developing a sufficient number of questions suitable for youngsters 


HGURE 6 


Sample Items of the I-D Mechanical Information Test 


When the examinees 
reach this item, the ex- 
aminer says, "When a 
blacksmith shapes brass, 
which one of these things does he use first: the water . . . the hammer 
. . . the anvil ... or the fire ? Mark the one that the blacksmith uses 
first." 

When the examinees 
reach this Item, the ex- 
aminer says, "If you re- 
move the top of a bicycle 
bell and look inside, which picture best shows what you would see In- 
side a bicycle bell? Mark one.” 
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^om rural locaUons required nearly a year’s time, as earlier noted. 
The first step was to list as many phenomena as the staff could think 

"“iversal to serve es topics tor technical items, 
i respective cl the degree of local modernizalion. This Ust iactaded 
common household procedures, 
mTrrn^s f commodities, such as candles or 

Thrseccnd^, '"'cre known to be found in all parts of the country. 

?or ° “ ‘“^60 sample of viUages, and to look 

Ust° Msir “““ ^^cd ’to the lurtLl 

shops sewi^ ma^h^ •opfcs as bicycles, blacksmith 

that were likeiw 4 ** ®valent in at least those locations 

tS tur?sip ''c=‘‘“onal courses. And 

the numerous item ai^tySs sMiiTthm !! 

version. eventually produced the final 


Insolar as possible was to tr^ d *'^0 “0*. the approach 

ia Figure 6. In the first illustrated 

have an opportunity to watch uL hf'l" ““ youngsters 
not all of them will be sidlicimtm 'V '“'^cksnulh at work, but that 
to note and recall what he actuallv !?**”*!*'* ^ activity 

to Identily those yowsLs second, the intent is 

Badgeu they use to t^- m IltS tut whfl^i' “t® 

depend on observations of nhon hat is inside. Both answers 
nludied in school Phenomena quite dilfereat from those 

Jttcr t":e~rmttittLt;““f -cc patterned 

Having the esaminer read enrh Test, and are highly effective 
^ecs wiu 0, ^ mtfrprei ensuresth^ufitiat- 

a.ma.a^pnresti.-~^^^^^ 

-cihodtt”^" test and tt. 

^ - - -o ?.the"^o^,r (k-L 
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and Thailand) certain of the line drawings were changed to equivalents 
more familiar to the examinees. 

Whether these kinds of changes are sufficient to permit the use 
of this test in a different culture, or whether the content also must be 
adapted cannot be determined from the limited data so far assembled. 
The results to date show that the original Mechanical Information 
Test was a reliable measure of "something" in these three countries, 
but have not yet confirmed that this "something" is in fact related to 
performance in vocational courses. Pending further validity studies, 
little can be said about the generahzability of the specific items 
developed in West Africa for use in other cultural settings. 

There is some evidence, however, that the use of printed 
instructions, as was tried in Thailand, does change the character 
of the test and should probably be avoided. When scores on the 
original version of the test were compared with the grades of ten 
classes of seventh>.grade students in their major academic courses, 
an average correlation coefficient of approximately .25 was obtained. 
But when the same comparison was made for ten classes which had 
been given the test with printed instructions, this coefficient jumped 
to .43, indicating that more scholastic ability (and perhaps less 
mechanical aptitude) was being measured. This suggests that the 
oral administration procedure should be used for vocational applica- 
tion, even when the test can be given in the first language of the 
examinees. 

Typical Results 

The reliability estimates typically obtained for the Mechanical 
Information Test are shown in Table 4. They are reasonably uniform 
and show that at least the reliability of the instrument was not affected 
by transplanting the items to a new cultural setting. 

The validity of the test in West Africa has been in the range of 
.45- .55 for trainees in a variety of skilled trades, as determined 
through both concurrent and follow-up studies of grades in practical 
courses. In the subsequent study at the Malawi Polytechnic which 
was discussed at some length in Chapter 2, a coefficient of .59 was 
obtained. But in Brazil a validity of only .28 was obtained in a study 
of 127 technical trainees rated for purposes of the research by their 
instructors. To what extent this more modest result was attributable 
to the criterion problems that were encountered, and to what extent 
It indicates needs for item modifications has not been determined. 

The Brazil study, moreover, was the only application of this test 
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TABLE 4 


Typical Reliability Estimates ot the I-D Mechanical 
Information Test 


— Education Number r* 


West Africa 

Primary students 

6-7 years 

2,033 

.79 

Brazil 

Craft trainees 

Mixed 

203 

.70 

Korea 

Textile Workers 

Mixed 

144 

.76 

Thailand 

Primary students 

7 years 

30 

.73 




« estSronSMotabir^rrt^ ? coimlrles outside Africa, and 
yet be projected Within locations therefore cannot 

eapecled. ’50 can normally be 

above ttolmUI the Siie^trad applications below or 

and in Korea. The tryouts cltrr^ om Thailand 

students in the eleventh and toelM ms "Hh 

carried out in Korea with semlsklUed^ additional tryouts 

results, confirming the West Afrtm r Save uniformly poor 

Mechanical Information Tests^D^J '"®®' “““‘T 

selecuon for the skilled todes 011 ^^ 1“ be limited to 

at the postprimary level. 


1"D Tests 17 and la- 

^ and 18. Science and World Information 

aad have so la^id onlyTmif.?® ”* '*'= 1-° lasts to be developed 
us^separately lor se ctn “PPUiaUohs. They Cbf ’ 

such?,',?''’ « bbem?si^„???? b“”lcula-Sclence Informa- 
used llrn'I? “ ®“"°o>ics or pubU? atoln'i ?’ Ibl“™ation for 
?■ ‘b ““sutuie a "eenem?^, "‘'"‘““-“i- Ibey may be 
?'* “ '7“* “actol lor admission tTlh?^'’™^"”'’' ‘'bt of the type 
1 educational Insututlons. Ebaeral program of higher- 
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Both tests contain forty mulUple-choice factual items of the 
tvne shown in Figure 7. When only one is administered, approximately 
30 minutes are required; when given seriaUy, ^ ^ 

suffice for both. Each test is printed on a smgle sheet of paper an 
can be scored by machine. 

figure 7 

Sample Items of the I-D Science and World 
Information Tests 

14. Women usually sing at a Ugher pitch than men because their 
vocal cords are 


smoother. 


longer. tighter. flexible. 


15. The chief ingredient In window glass is 

nylon. amber, cellulose, q^rla. 


76. The main concern of UNICEF is 


disease 


. children. argiculture. dlsarmanent. law.^ 


77. Bamako is the capital of 
Angola. The Gambia. 


Niger. 


Gabon. 


The Adaptation Procedure 
since these two 

levels, and since their con en format of information tests 

JTol'requUed.'Xr^^^ problem was to find a sufficient number of 
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the elJcSs ol minimize 

so afto measure ho?™,. ==hool 
In accordance with the bad- inlormation he had acquired 

oIIormaT^ucrtta ^aa E'^rtaating the eHects 

science are taught in school aou^* ‘mpossibie, since ail aspects ol 
answer to any question that any examinee could have learned the 
courses. But because ®ight include in one of his prior 

three science subjects it seemed”? *^hf tn only two or 

covering a much broader range oi??*'!? ^fleets by 

to have encountered in sch?falone?h? 

inlormation in addiUon to for? n ’ requiring Incidental 

achievement of a high test ‘or the 

anatomy, astronomy, medicine h- i "ilscellany of items concerning 
mathematics was Included in the “'’^"’’“'■7, physics, and 
sample items in Figure 7 an ^ 

””” -™ai,y strerdtattSaYcrurs^f ” 

Test was to o! the World InlorraaUon 

Inlormation lhat might be predict? nategorles ot 

“in ‘ndicauJe ’on™?” 1“^” in'^antal 

nior?k ” "'“‘y so easy to s?? *®wnnd "nonscience” 

“ecued '’“‘y “>4s »ose that relate to the 

eciaed to focus on evenu rtiT^ scientific. Eventually itwa<i 

^'Su'nrly ht’ aetocai?*””’ °iSanizations that 

*" nntlonal and inteS "'Y^Pnpers because of their 
nesultlL^tes?”””””''' dlecUve. MuS*onh‘”f%‘“” approach was 
Alrican counlrT””’ ““rse, highly sDec ?”i”^°™“*‘“" “ ‘•'n 
were usedM*?.” ’ “ "ns the newsn^? interests ol the 

as the primary source lor The^'^'^f West Africa that 

A -nond problem ^-=‘»Pn.ent of the quesUons. 

*nore frangible ^ effort virtually every 

’ ”■ '‘“-‘lonJ oU^rrlgbUevT 
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of difficulty for the examinees. The present level appears to be most 
appropriate for individuals with twelve to fourteen years of education, 
at least in the African countries. 


Intercountry Modifications 

Because of its heavy emphasis on African affairs, the World 
Information Test was judged unsuitable for use elsewhere, ^d was 
not administered at any of the three other locations. It is likely that 
a totally new form would be required for South America, and one or 
two still-different forms for the Asian countries. 


The Science Information Test would seem to be less dependent 
on cultural factors, and a translation of this test was prepar^ for 
use in Korea. It appeared to be reasonably 
study In which it was appUed, but considerably 
assembled to determine whether or not the ongina i em 
fact be used without modification. 


Typical Results 

The estimates of reliability that have ‘‘’g 

Science and World Information Tests are 

The reason for the very high figure obtained In K^ea is ataost 

certainly the heterogeneity of the examinee 

education from the elementary to the university 

estimate of .70-. 75 for university-level students seems app P 

for both of these tests. 

VaUdity estimates for .be -ts nsed '^n W^^^^ 

and in Malawi were in the range of .30 . , ^ overall grades 

above . For planning purposes, an esUmate of approxima 
seems appropriate for this type of tes . 


NUMERICAL TESTS 

TbeW-rm=-n^ns^en^^^^^^^^ 

one test that combines q .cholastlc technical, and com- 

m“l’rcki?.^le=^^""^‘rwr^^^^^^ sbills are required. 
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The I-D ArithmeUc Test can be used from the P5‘“";7, . 

up to the university ievei, and has given good 
the above applications throughout this ' ®f^dXed 

apparent simplicity is s” min« 

applicant groups, it is preferable to adm mister a seemmg y 
level test to examinees who have completed nme o S 

formal education. 

The I-D Graphs Test i^s " “etuySS 
later years of secondary school, and is y 

up to and including the university le rr.ggt with which it 

an appropriate replacement for the Arithmetic Test, witn w 
has a moderately high correlation. 


I-D Test 14: Arithmetic 
Although the Arithmetic Test 

numerical calculations, its purpose * ^ ^ g^gntly perform the 
accuracy with which the examinees require. Rather, 

arithmetic operations ji^raclMistics of individuals who have 

the rationale is that one of t^gs mathematical is 

a generally high aptitude for calciAaUons at high rates 

that they are able to perform tUt of mental arithmetic 

of speed, and that their scores [his more general 

can therefore serve as a ®J'”'P7j,h„.tic Test should be regarded as 
aptitude can be inferred. TheAri vehicle of computation 

a fairly broad-spectrum test that apUtude in the 

problems to approximate the examine 
quantitative domain. 

neneral Description 

The Arithmetic Test dM^^ 

Uon, and division problems, f the lime limit of 

two separately be sufficiently long to yield 

five minutes per part is 7 7, anv of the examinees to work 

reUable scores but Th^tal ^ministration Ume is 20 

rute‘raSrthe^^^^^^^^^^^ 

Te» Ariantation Pr ocedure 
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HGURE 8 . 

Sample Items of the I-D Arithmetic Test 


90 

+ 

35 

= 

65 

125 

55 

120 

121 

56 

+ 

31 

= 

86 

25 

97 

75 

87 

48 

- 

35 

= 

18 

13 

3 

15 

23 

97 


59 

* 

39 

16 

36 

38 

48 

64 

69 

X 

7 

= 

568 

450 

J48 

442 

508 

X 

5 

“ 

344 

355 

345 

346 

335 

406 

128 

* 

6 

= 

50 

68 

58 

51 

41 


2 


64 

57 

68 

63 

79 


SSr; - ‘He 

been taught to the' r *“ “eithmetto thit ,h ot the 

Utit the Arithmetic ‘^equently waa in “emtoees had 

I'M to be ovcreomelf ‘'equlres, and these "“H ‘be one 

sa2r.r..“”“«“4Ks“’“ 
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the desired approach required a three-step procedure. 
The first was to administer the Arithmetic Test as one of the last 
tests of each session, when the concept of time limits was already 
well understood. The second was to stress speed throughout the 
examiner’s pretest instructions. And the third (and most effective) 
was to let the examinees attempt to solve twelve practice exercises 
in a one minute warm-up session which demonstrated the rate of 
speed necessary to complete even half of the test. That completing 
the entire test was impossible was, of course, also explained in advance, 

To compensate for the differences in strategy that persisted 
despite these careful instructions, the standard I-D correction lor 
speeded tests was applied. As noted in a preceding chapter, this was 
to Subtract the number of wrong answers from the number answered 
correctly, so as to increase the comparability of the scores of the 
more reckless and the more conservative of the examinees. 

Intercountry ModifiLations 

The Arithmetic Test was administered experimentally in all 
three countries. Because of the universal applicability of computational 
problems, no changes in the original items were thought necessary 
or attempted. 

Typical Results 

The reliability estimates typically obtained for the Arithmetic 
Test at various levels of education are summarized in Table 6. They 
are consistently high, as would be expected for this type of test. The 
lower figure in Thailand may be attributed to the fact that this is a 
different type of estimate, and perhaps also to the use of printed 
rather than oral instructions, which might have been less effective 
in communicating the speed that this test requires. 

Validity estimates against course grades at the secondary school 
level were in the range of .30- .35 In West Africa, for students already 
selected, in part on the basis of other arithmetic tests. In Thailand, 
comparable estimates for comprehensive school stxidents were in the 
range of ,40- .50 against grades In the regular academic courses. In 
studies at the primary school level, the estimate of .15 obtained in 
Brazil and the estimate ol .62 obtained in Thailand are widely divergent, 
probably a result of differences in the criteria that were applied. 

Overall, a projected figure of .35 would seem to be a conservative 
estimate of validity for admission to the general academic program 
of secondary level institutions. 
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TABLE 6 

Typical Reliability Estimates of the I-D Arithmetic Test 


Country 


West Africa 

Primary students 

6-7 years 

3,684 

.88 

Brazil 

Primary students 

5 years 

135 

.85 

West Africa 

Secondary 

11-12 years 

138 

.88 

Thailand 

Commercial students 

11 years 

61 

.73 

Korea 

Technical students 

12 years 

228 

.86 

Korea 

Telecommo trainees 

Mixed 

138 

.85 


correUHon between'the°seD^aI?i esllmale are based on the 

Thatland leluSteSwenu'''" 


comparable to the We^ or commercial courses, results 

obtamed in erght oUhf :5Je -'“-SO 

^ in Korea. la the alat sm“the *" 

lower, but it seems that on bailee , estimates were appreciably 

nelectioa is aa appropriate p^Som' 




- *'^>1 ii: Graphs 

;^eI.D sTrlefS - iacladed la 

^s been a highly aselul p“ dTctorm '^P^ ‘^=1 

toto advaaced lechaical courses ;^f 'or selecUoa 

1.1?“°”“ »ns the hope that thi^Sit careers. 

Jor^e more general twts nf r« substitute in cart also 

ll?eru----Oraphs¥eriS“m'S=P-|-^^^^^ 
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General Description 

f Graphs Test consists of thirty problems related to the 

Figure 9 and of a second part of thirty additional 
problems related to another similar graph (not shown). In each item 

the shaded portion, and 
wen to find the corresponding value of the other variable as quickly 
as he can by correctly reading the graph. The time limit of four 
mmutes per part is set so that no examinee will finish the test; and 
we total administration time is approximately 25 minutes, including 
practice and explanation. The test is printed on a single sheet of 
paper and can be scored by machine. 


Sample Items of the I-D Graphs Test 


■■■■■■■■■■a 


■■■■rirfaaBBMaaBSBSaSSSal 


laacMaaaiiaaaaaa^^aMa 

■aaiaMiMBaiavs^saanMaaMUM 

iBanBBaBaaap^BaBBBBaBaBBBBaaaB 

iBaiBBaBBB^^aBBaiflBBBBaaaBBBBBBBBBaBl 


■BHBHBl 


|BBBBBBBP2(B« 


iaaaBaBBBffiSBfi! 


laK^BflBaaBBBBflBBBBBaBBaBBBBBBBa 
kaBBaaaaBBBiaaiiBaBBBBBBaa'"'”' 


■■BBBBMW 

BBBBBBBBBI 


iggiiiliiiiiigagggggggggi 
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The Adaptation Procedure 

As iUustrated by the sample items In Figure 9, several complexi- 
the graph-reading operations that the items 
ordinate dimension of the 
graph is scaled by units of five» while the abscissa Is scaled by units 
w <0 ‘he ordinate value cor- 

“‘“e. “d someumes vice versa; and 
S erid " ‘“‘eepolaUon between adjacent lines o£ 

nMuslv to a “e“‘”ee e“ood slmulta- 

St ae ItT^ , S "““erlcal and spatial, in solving 

™ ‘‘ems, and thereby to obtain a measure of Ms eeneral 
lacihly in coping with quantitative relaUons. ® 

suitabfe“s?r^cLnVwl?i^tt;,T?r 

the West African version Full *“ designing 

explanation proced"es Id f I-D 

session preceded the test This vI demonstration and practice 

had alreuy etudled graph-reSinv who 

prior study may in fact be a niathemaUcs courses; such 

test. Attempts to admLutlTLTl'f ■>“»= 

years Of secondary 

to be Partlcutorty‘^‘E^i,^“l^'JJ'“® t" explaining this test appeared 

erapM, for example, '^cl^^hrSe^^e '’” 

are at 40- or "any point on thlTltoelL J “do line you 

effective than the simple "this line is Iim it'"’ ''^^e much less 

fhe ' tf hltehtion to phrasing (Sh^I, A eventually adopted, 

the right phrase requires) are^ua^ ^ ^rror that finding 

test operations, as noted in ChaX 5 ’““‘'e complex 

sSS °?"= -hs 

Salable duplicating equipmen^o^ a locations at which 

repJSucSm !»Mtl„[ intSeYt“‘“'^'’^‘^ distressed when 

multiUUi equip™n't“oV“”tror t ^“”c™e.' aL'^s°s 

necessary prerequisite tuT^ STSs ^ mundane 
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Intercountry Modifications 

The Graphs Test was tried experimentally in Thailand and in 
Korea without substantive change. In both countries the origina ver 
Sion was found to be fully effective. 


In a recent study in Micronesia, an attempt was made to reduce 
the complexity of this test so that it can be administered ^ 
at a lower educational level. The same scale was use or ° 
sions, the size of the grid was enlarged, and no in erpo a P 
were given. These changes made it possible to admims e 
to eighth-grade students, but their effects on the test s predictive 
properties have not yet been determined. 


Typical Results 

The estimates o£ reliability typically S'c 

Test are summarized in Table 7. As In the case o . ^ 

Test, the low figure obtained in Thailand is °„1 

the nature of the coefficient, and to the use of prmted rather than oral 

instructions. 

The validity of the Graphs Test “ 

at the upper secondary school levels was m t 8 ^ ^ ( 1,5 

West Africa and in the range of .30-35 i”Thail^d. In Korea th^^^^ 
validity against grades in telecommimica ons , (j^ Graphs 

in the study of technical institute 7or geLral 

Test (as of all other tests) was of negligible pro^i^bons. cor g 
academic selection a figure of .35 may be projec 

For selection into advanced 

the results in Thailand were consistently highe 30-.40, res- 

figures. VaUdity estimates in the ranges of .4 . 40 is 

pectively, were obtained at these two location , 
a reasonable projection. 


technical aptitude tests 

Three of the tests in the I-D series are “‘‘‘‘ro^fd specmcally 
?e“^^rfrr\TeVfthe%TmVsUUe^^^^^^^^^ ar^ Jes.gnei for selecUon 
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Country 


TABLE 7 

Typical Reliability Estimates of the I-D Graphs Test 


Group 


West Africa Secondary 
Thailand Commercial students 

Korea Technical students 


Education Numhpr j>* 


10-11 years 

11 years 

12 years 


147 

61 

228 


.83 

.50 

.78 


two are based o^tt^H^lent ^efficient. The other 

timed halves ol the test. correlation between the separately 


Into the skilled ..tv 

courses. They 

‘ balanced Index of abiute min “‘ber tests so as 

""“Pe-'ceptaal eomponenUi^t job ^riormMc‘’e°"’ 

ibatrument. used 


1 -u Test 8: 


’■ Checking 

“‘T briefly because 

needs for low 1 p °f ^^tlaritles Test earlier 

-otad uTetm most oMhr '™ 

“are adranced ability meaTies was 

£gH ^Descrlp Ho„ 

=“ >be"xl^m^°“ m'‘'° m^Th itl^^ ‘llnstraled 

- - 0^;^- £T"^rhe“m"Sre^ 

"M diners Irom an 01 the rest. 



THE I-D APTITUDE TESTS 


143 


FIGURE 10 

Sample Items of the I-D Checking Test 


^ ^ ^ ^ 


5 § 


The test is divided into two parts that have time Umits of 
minutes each, and the totai administration Ume seldom exceeds ten 
minutes, inciuding the pretest instructions. The one-page test pap 
can be scored by machine. 

The Adaptation Procedure 

The development of this test 

to conclusions about "obvious" cultural modifica ^ 

noted. It had been assumed that objects familmr m 
setting would have to be used as the basis for the 
test items, and a variety of more famiUar was developed 

as the first step in the adaptation procedure. fSnt made no 

Chapter 4, it was found that the nature of “^4”ge 

difference to the utility of the item— perhaps urouos that 

nor familiar objects were interpreted correctly by 8 

were tested. The standard test items tried wer . , mst 

especially prepared versions, and the selection of content for this 
posed none of the problems that had been expected. 

Nor was there any problem ta readlly°“^™- 

inees. The concepts of "same Md practice Items 

rHow rr ors^Tt"^v:rd“^^^^^^^^^^ - mr 

nonliterate groups. 

The one problem that Ibis lest does raise, however, is that 
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virtually Ilawless printing is an essential. Since most of the drawings 
are not interpreted as representations of real objects by the examinees, 
My imperfection in the reproduction of one of the drawings will make 
it as different" from the others as will the defect intended: if an 
examinee's score on an item depends on whether he notices the 
del berate or the Imdvertent dilterence first, the teat results will be 

break “ '•nii'tentional 

‘he 'wltom line of one of the five drawings may seem to an 
examinee who does not recognize this as a faucet to be the detect 

intercountry ModUlcatlons 

Knrea“No°c!L1erw?r“U'l‘'.t‘^“"f 

testing procedure. *" '“Ment of the items or in the 

Typical neaulls 

are shown In Table 8.°^No"comtarStr"* *" Africa and Korea 
groups below the primary school level.' 

not attempted. in'S Korew studj’^}',® Checking Test were 
was obtained against the crltcrlon'^?Vr workers, zero validity 
the foremen of the workers whi ^ h/ evaluations made by 

test and its pro’^abl^^-"^^^ 


table f 


Typical RellabilUy 


Eatiraatesofthel-D Checking Test 



Um«'‘^iL"e1 of the“'s..°" between the 
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I-D Test 9: Boices 

The Boxes Test is a measure of three-dimensional visualization, 
which is one of the basic abilities necessary for most technical occupa- 
tions. Prom a methodological point of view, its development was per- 
haps the major accomplishment of the initial AID/AIR research in 
that it demonstrated the feasibility of devising pencil-and-paper 
measures ol skills that had been thought to require apparatus tests 
in the African setting. But its suitability for widespread operational 
use has been limited by the demands it makes of the examiner, who 
must be given special trainii^ and practice in the fairly complex 
demonstration procedure that the effective administration of this test 
requires. 

General Description 

The Boxes Test consists of 48 items of the type illustrated in 
Figure ll. The object at the left of each item is a "pattern" that will 
form one of the two cubes at the right when it is folded to bring the 
six faces together; the examinee is to go through the test as rapidly 
as he can, indicating which of the two cubes will be formed when this 
folding operation has been completed. The test is divided into two 
parts of equal difficulty, with time limits of four minutes each; but 
because of the lengthy demonstration procedure, a total of 40 to 45 
minutes overall is required. The one-page test paper can be scored 
by machine. 

The Adaptation Procedure 

The major problem in measuring spatial skills with a pencil- 
and-paper device is that the representation of three dimensions on a 
two-dimensional surface is a special convention that has to be 


HGUHE il 

Sample Items of the I-D Boxes Test 
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drawtags'is thoro'Jih^lMma^lth‘’tM*^ ““"“'“‘ied by 

««! test Problemslrre “y BurlL 

will see them the way they acliianr,f "tnaccustomed to drawing! 

representations. Ani Sasl^rtL two-dimensionj 

Alrlcan examinees generally do into ^’Ad shown that uneducated 

depth, apparatus telts SiS T"' as haying no 

were traditionally used 


teet, « wi\‘Smld^£t Peacil-and-paper 

sioMi drawings would have to be three-dimen- 

Jesting procedure. And since integral part of the 

ta taletpret a broad range oTd«X, t the examinees 


The neat task ur^o 

error ^t? P'o^dfre. 
EventuaUy, Uie Procedure were 

Slvtaa each '°“P‘« sequence now us.h ! “derstood. 
and manlpuute whif° models ol the cube*^^tk^' '“Otades 
oI these c^e^on^^ watching ihe dcmons^Uon.!!^' '“ok at 
model to the lacS.'^ blackboard, to rcSS , ’ drawmg a sketch 

o'ock-up to the cr. ^ “sed as a^l “'“‘'■’es to the 

drmonsSi “Ynf ” “>= «^toee w“‘ ‘Ms 

trom the “I the Items by reranvi ■’.'Notice sheet; 

rabo to the corrcs’,^, “ Mlol cube- Pattern 

•“M repeating thLS*"® '''= bUc°4’oa>-lf'*"® ‘°rme 

“rms and S e«h Mr the rem,f 3' ““''‘-“P. and paper- 

■ncd to eolve S' PMctlee eaerels^ Mb sample 

used In any nt ihe'mh '''bM proced^«' “aminecs have 

'b'nprehcnslon wtth^^^^" Mste, burw^rec^a^^,' 

The test „e 

dil/icuha. thenis«»iTr»_ _. 
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problem. The most difficult exercises (such as Item 22) require true 
three-dimensional visualization. This arrangement was adopted so 
that the examinees with lesser spatial ability would not become dis- 
couraged at the very beginning of the test, but is useful also in deter- 
mining the degree to which the instructions have been understood. An 
examinee who cannot solve an exercise such as Item 6 clearly has not 
understood what it is that he is to do. 

For examinees already familiar with drawings in three dimen- 
sions, these elaborate instructions are not required. But, inasmuch 
as there were no visible signs of boredom or restlessness on the part 
of the more advanced groups, at least in West Africa, the entire 
sequence was used at all levels to ensure uniform comprehension. 

Intercountry Modifications 

The Boxes Test was administered in its original form in all 
three countries, but for experimental purposes only. It was felt that 
the examinee groups typically being tested in these countries would 
have no difficulties in interpreting three-dimensional drawings, and 
the research effort was soon shifted to standard tests of visualization. 
In Brazil a standard cube -completion test gave good results with 
groups at higher educational levels; in Thailand, a standard block- 
counting test was found to be a poor predictor of performance in 
eleventh-grade shopwork courses. But the data so far compiled are 
too fragmentary to permit an adequate evaluation of the suitability 
of standard visualization tests in these countries. 


Typical Results 


The estimates of reliability typically obtained for the Boxes 
Test are summarized in Table 9. That higher estimates were obtained 
for groups of heterogeneous educational backgrounds is normal for 
most types of tests. 


EsUmates of the validity of the Boxes Test against perform^mce 
in practical shop courses were in the range of .30-.45 In West Africa, 
for a variety of skiUed trades; and the esUmate of .37 si^sequently 
obtained at the Malawi Polytechnic was 

In the other countries the estimates were .40 for the lelecommunlM- 
Uons trainees In Korea, .35 for aulomoUve trMnees In 
.29 lor trainees in four sklUed trades In Brazil. In general, a figure 
near .35 can be projected. 
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TABLE 9 


Typical Reliability Estimates of the I-D Boxes Test 


Country 

West Africa 
West Africa 
Brazil 
Korea 


Group 

Mixed 

Mixed 

Craft trainees 
Commo trainees 


Education 

6-7 years 
8-9 years 
Mixed 
Mixed 


Number 

3,460 

1,050 

231 

138 


•All coefficients are based on the 
separately llmetl halves ot this test. 


correlation between the 


r* 


.84 

.83 


i-u Test 10: Figures 

M Ihe^BMM Test appUcations 

la Umlted to two-dtae„sioS measure. But because it 

nearly so complea an adnilnlstrativB*”****u"^’ ** "i" require 

^tween the two tests Is genSaUv^i“'®'u f '“"alation 
Insether. EeneraUy low, and they are normally used 


FlEnrJ5l''{f;“ forty Items of the type shown in 

may be lound amid other Unes thSiru I'm lup 

•hi?df aS quicwy it.*” “I “■= axamlnee 


"T* a ero required. ^ a'andard approach 
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FIGURE 12 

Sample Items of the I-D Figures Test 



The first .nodmcatien was to werf 

of the drawings that are lines the examinees 

embedded in a large complex of criss-cross g , ^.^biems, 

had to spend an average of '’““‘"“‘s/appUcatmns. Items that include 
and this was too long lor pracUcal test W shapes were 

the bare minimum of lines necessary to disguise the snape 
developed, as illustrated in Figure 12. 

The second was to mSg°lhem mdeways or 

^irin^rhlfart^eTp^f "^eTu^e. and this is explained 
explicitly in the pretest instrucUons. 

The third was to teach 

shape" as part of the ^ Africa would select 

instrucUons, most of 5 j pi_,re 12, lor example, or 

S: ra^ S: - Sd^^n S jmm ?,°»y heUse Uiey are 
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accustomed to precision in matchii^ sizes and forms exactly. To 
teach this concept, a visual aid was developed that permits the exam- 
iner to remove the shapes from the key and to superimpose them on 
the sample and practice items, to demonstrate the precise fit that is 
required. 


With these modifications, the teal could be administered effec- 
tively to examinees at all levels. Even examinees with minimal educa- 
tion could work the problems at an average rate of fifteen to twenty 
seconds per item, Instead of the sixty seconds that the standard ver- 
sions required. 


Intercountry Modifications 

of the administered experimentally in aU three 

° to S ainth modUIcation required wae tu use letters 

Thitod V dharaeters ta Korea and 

■rom left to right rather than verUcaUy, as ou the original version. 
Typical Resulia 

Test are sm^maVlred m TaM*e'*}o ^‘Bures 

estimates are attributable tn th« u '“Bh Brazil and Korea 

ttriDutable to to heterogeneity of the examinee groups; 


table 10 


Typical R ellablUty Estimates ol the I-D Figures Test 
Country 


West Africa 
West Africa 
Brazil 
Korea 
Thailand 


Group 

Mixed 

Mixed 

Craft trainees 
Telecommu trainees 
Primary studenU 


buucation 

6-7 years 

Number 

3,978 

r* 

.70 

8-9 years 

1,282 

.72 

Mixed 

239 

.84 

Mixed 

7 years 

138 

30 

.85 

.54 


wins of tof? dodreUtlons between the 
*Uch 1 . a KR.go c Jmcft “«P‘ «ie ThallaiS eSate. 
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the low Tliailand estimate* to the use of a different type of coefficient 
ajid of written rather than oral instructions. 

The estimates of the validity of the Figures Test against perfor- 
mance in shopwork courses were in the range of .30-.40 in West 
Africa; the estimate of .38 subsequently obtained in the study at the 
Malawi Polytechnic was consistent with this result. In the other 
countries, the estimates were .24 for the telecommunications trainees 
m Korea, .32 for automotive trainees in Thailand, and .32 for trainees 
in four skilled trades in Brazil. Overall a figure of .30 can be projec- 
ted. 


CLERICAL APTITUDE TESTS 

Three of the tests in the I-D series are addressed to speed and 
accuracy in routine operations, as required in many clerical job func- 
tions. Although of somewhat different levels of difficulty, all three 
can be used at the postprimary level, and they are normally used 
jointly for such applications. 

The I-D Coding Test is the most elementary of the three, and is 
normally dropped for the more advanced groups, who are apt to be 
bored by the strictly repetitive operations required. The I-D Names 
Teat and the I-D Tables Reading Test can both be used up to and 
including the university level. In combination with Verbal Analogies 
and Arithmetic, these three tests comprise the I-D clerical aptitude 
series. 


I-D Test 12; Coding 

The Coding Test is primarily a measure of clerical speed, 
although some elements of accuracy and of memory are also required. 
Historically, this type of test has sometimes been used as a nonverbal 
"intelligence" measure, but in these studies It was neither Intended 
nor evaluated for such applications. It was included in the I-D series 
purely as a job element test that attempts to replicate the clerical 
task of rapidly copying data from source documents, to prepare a 
new listing or tabulation. 

General Description 

The test consists of 150 items of the type illustrated in Figure 
13. In the blank spaces under the symbols, the examinee is to write 
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the corresponding symbols as rapidly as he can, using the key at the 
top of the page. The time limit is two minutes for each half of the 
test, and the total administration time is 15 minutes, including instruc- 
tions. This is one of the few 1-D tests that cannot be scored by machine. 

The Adaptation Procedure 


The only difficulty encountered in the development of this test 
was to find the right words to explain the association between the 
symbols on the top and the bottom lines of the key. There was con- 
siderable confusion in the early tryouts; this was eventually traced 
to toe use of toe same word (-symbol- or "mark" or "thing") to refer 

the dlstincUon be- 

Accord^rt, .“-e stimuU and those that are the responses, 

the nirtiii^ ? f^'tsed instructions begin with the explanation that 

■Itys b:S » .‘“?>-">"'.Palrs shown in the key 
their test naMcrfh.™^' i, ’ **’ “amuiees are shown that on 

fill them to 

will be together This^itnnV^h**“^ symbols and marks 

W6«iner. This simple change was fully effective. 

of pracUce problems als^mvS^f ’ ^ adequate number 

of pracUce problems Le T* ^ important. Two Unes 
P ODiems are used to show the examinees that they are 


figure 13 

Sample Items of the I-D Coding Test 
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not to stop and wait after the first block of "“”VleL\1argr°" 

understandings that an expanded practice session corrected. 
Intercountry Modifications 
The coding Test was 

the three countries. In Korea the oripn retained, but 

modification. In Brapl the origlMl i em Thailand, 

administered with printed rather thpi o multinle-choice version 
certain erf the symbols were changed the 

was also developed to pave U*® “ effectiveness of these various 

basis of the data so far assemble, the e lection ^ 
approaches seemed to be approximate y 

Typical Results 

The estimates of reliability are reasonably 

Test are summarized in ThWe U. All bf 
high, as is usual for tests that depend largely on sp 

The validity estimates sSpenBiors^clerical 

courses and against ratings supp ^y African studies, 

workers were in the range of .30 . Malawi an estimate of .32 

In commercial and secretarial cours ; 28 was obtained 

was obtained. In Thailand an average est.rnme 01 ^ 
for students in commercial course . general, a figure 

data on clerical Pbriormance were not co i„dents, and some- 

near .30 can usually be exppted 'or ‘■■amees o 
what better results for clerks already employed. 

I-D Test 13: Names 

The Names Test ^j^t^^Vprl^ary emph^^^ is on 

it is entirely perceptual, an Coding, it is a job-element 

accuracy rather thm. for proofreading, lor 

attention to detail is Important. 

General_Desc£lption 
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TABLE 11 


Typical Reliability Estimates of the I-D Coding Test 


Country 

Group 

Education 

Number 

r* 

West Africa 

Mixed 

6-7 years 

3,770 

.87 

West Africa 

Mixed 

8-9 years 

895 

.86 

Brazil 

Commercial trainees 

Mixed 

230 

.81 

Korea 

Telecommo trainees 

Mixed 

138 

.86 

Thailand 

Commercial students 

11 years 

61 

.89 


Iha “ettlcients were based on the correlation between 


the 6ame***(The words'"^* Ihe two spellings ot each name are exactly 
a separate hrm would no? toe to"bt 

speaking Alrlcan countries^ Th^tesUn French- 

a Ume limit of two rainiiffl« L«w !® divided into two parts with 
is 15 minutes, includlnc ln<stri »< administration time 

he scored by mSl?* “e one-page test paper can 

The AdaptaUon Ptored.,- 

^Is test. The‘Si7e7ww?°sTi?ct?d T ^ ''s’'StaP“S‘'h 

^ks, and other Alrican sources- e?^ sewspapers, telephone 

'*®'s‘hig a letter, Insertion introduced by changing 

two letters ol the original version^ letter, or transposing 

leund to be satislactory, Mi developed was 

.j improvements were not attempted. 

™ '™ss't?f ‘j'“ro"6hly understood by aU 
therclore required ^ “Plsnation o£ the 

Pmctore and to emptoi??',he to “*= =“"sct mantog 

I-D techniques are appUeg °I speed, the stand^^ 


original version ol fe'^wa^" ““ European names, the 

was considered unsuitable lor use In th 
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FIGUKE 14 


Sample Items ol the I-D Names Test 


Abeni 0. Olukoya 

Abeni O. Olakoya 

CORRECT 

Albert Fowode 

Albert Fowonde 

CORRECT 

Donald Strickler 

Donald SUickler 

CORRECT 

H.G.Biyoque 

H. G. Biyoque 

CORRECT 

Jean Reynaua 

Jean Reynaud 

CORRECT 

A. S. A. Gbajabianaila 

A. S. A. Gbaiabiamila 

CORRECT 

Zlado Dabbagh 

Ziadu Dabbagh 

CORRECT 


different 

different 

different 


different 

different 

different 

different 


other countries. Modtlted forms. ’-"i' „Tt trit^r'*^ 

in Brazil and Thailand; in Korea, a test of this type 

The BraziUan version 

African findings. However, the ^.nmoarisons of numerals rather 

repUced with a similar test based on P failure of 

than names. It would be J to certain characteristics ta- 

this test in Thailand was Uie result of certain 

herent in the Thai alphabet, or '"'J®*" *, jed or of the examinee 
idlosyncracies of the seem to be no logical 

f^^^f “ SSststrhave differed properties with dUferent alp - 
bets, but the matter was not pursued. 

Typical Results 

estimates Of^eu-Uty typ^^^^ 

oruSirtesTa^est 'e‘«‘ 

The estimates of vaUdlty against 
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TABLE 12 

Typical Reliability Estimates of the I-D Names Test 


Country 

Group 

Education 

Number 

r* 

West Africa 

Mixed 

6-7 years 

3,675 

.73 

West Africa 

Mixed 

8-9 years 

976 

.76 

Brazil 

Commercial trainees 

Mixed 

230 

.81 


All of these coefficients are based on the correlation between 
the separately timed halves of this lest. 


be ““ perlormance could not 

<” »■= of .30-.35 cun be 

For couni'ri™ iS countries In «,hlch the Roman alphabet is used, 
test “o™ ““ "" ■>' ‘yp« 


I-D Test 15: Table Reading 

series. It requires nrf^nl^v difficult of the clerical 

of the spatial orientation InvolvedTnth”^ “ element 

higher-level I-D commercial levt * reading test. In the 

-OraphsTest,.hich.™breU^^^^^^^ 


^sure 15. It Is into ‘he type lllustra 

°J minutes each and ^ 

overall. Hie onelS^- f administered in a total 
machine. Page test paper can be scored by 


-ore encountered 
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PIGimE 15 

Sample Items of the I-D Table Reading Test 


— 

DAY 

A 

B 

c 

HUMBER 

OFT 

F 

ESTS 

G 

CIV 

CHO 

H 

EH AT EA 
OL 

1 ' P 

:h sc 

K 

HOO 

L 

L 

M 

1 ^ 

1 ° 

MOH. 

)06 

347 

114 

318 

862 

iS9 

218 

477 

133 

509 

153 

127 

424 

1382 

jiu 

TUE. 

1 

84 

255 

J52 

358 

889 

437 

164 

.53 

104 

837 

229 


358 

425 

-1 — 
(895 

WED. 

127 

208 

1 

559 

548 

7U 

f;.. 

143 



288 

173 

327 

b 

255 

273 

573 

509 1 


340 


1 

m 

B 



50? 


818 

38? 

3?8 


884 

4?7 j 


IB 

i 

B 


a 

B 

Bl 

gj 

Bl 

Bl 

573 

41o| 


796 

308 

SAT. j 

m 

i 



a 

m 

m 

91 

aai 


ai 

B 

1 

Bl 



itself, an<i this profusion of numbers confused many examinees. The 
change to the use of days of the week and letters as the tabular head- 
ings, so that numerals appeared only within the cells, resolved most 
of the Initial difficulties that had been noted. 

When used at the primary school level, the test continued to 
require highly detailed explanations, however. It was necessary to 
assume that at least some of the examinees had never seen a table 
before, and to explain the concept as though this were true of the 
entire group. The fastest way to do this, tt was discovered, was to 
teach the examinees to place one Index finger on ’•Tuesday” and one 
on "School E," and then to bring the fingers together to find that ’•869" 
is the answer required. After a few problems have been worked using 
this finger method, the examinees understand the concept of tables, 
and the test can be given. 
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Intercountry Modifications 


The Table Reading Test was translated into Portuguese, Thai, 

Md Korean with no substantive changes. And, although only Umited 
data were assembled, these direct translations appeared to be suffi- 
ciently effective that the test could be considered ready for use without 

furthpr TTiftHHiA-iUn., •' 


Typical Results 


ReadlM typically obtained for the Table 

merclal cours'ea'Md'aiSS^t'raff^'^* grades In business and com- 
Clerks already emplo^dTere l^tfe 

African studies. KSlTd JoJ *" West 

Ot .30-.40 for grades 

test was ineffectual as a nrpi^?! courses; in Malawi the 

secretarial trSl ,, f Performance In clerical and 

normally be expected. appears that figures of .35-.40 can 


table 13 

^"’‘“‘“TlS“-elthe,-D 
Table Reading Test 


Mixed 
Mixed 
Commercial trainees 
Commercial students 


elest coefficient. ^ Thailand estimate. 
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MANUAL ABIUTY TESTS 


Three of the I-D tests are addressed to manual skills of the 
type required in some semiskilled jobs and in all of the skilled tr^es. 
One is an extremely simple test of virtually pure motor speed and 
is intended for use in screening illiterate or semiliterate examinees. 
The two others require skilled eye-hand coordination and are intended 
for vocational selection at the postprimary level. 


The I-D Marking Test is the lower-level screening de^dce, 
suitable for jobs that require high speed but Uttle skill, such as cer- 
tain assembly-line operations. The I-D Manual ' 

sures dexterity in using the larger arm muscles, ® ^ 

Dexterity Test measures similar skills in the use of the {" " 

cles alone. One of the other of these two tests is J>'cl“ded i" 
tion programs for the skilled trades, in accordance with the nature 
of the manual skills that the trade to be learned requires. 


I-D Test 21: Marking 

The Marking Test is part of the I-D 
also includes the Similarities and Checking Tes , ii-nHon and 
two instruments, it has so far found Uttle practical appUcation, and 
will be described only briefly. 

General Description 

The Marking Test consists of 300 scares ^ ^ 

ten, as illustrated in Figure 16. The t^k is simply of 

each square as rapidly as possible, ^e scores 
squares completed. The test Is adminis ere the total 

squares each, with a time Umit of two minutes 
aLinistration time Is fUteen minutes, including Instructions. 
Marking Test cannot be scored by machine. 

The Adaptation Procedurg 
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FIGURE 16 

Sample Items of the I-D Marking Test 

□□□□□□□□□□ 

□□□□□□□□□□ 


»" »>'I'=h the examiner himself 

practice to teach iht. ^ ^ministered twice, the first time for 

elapses.* No adaptallon^n^hi^^^ ^ two-minute time limit 

«0 adaplallon problems of any kind were encountered. 


Wcrcountrv ModllleaHon. 


on., K^a'. 

Tnileal Resulli. 

Test are' SMmarS^Jn'Table'H 'S' '’““If “ ‘o’’ ‘^o Marking 
Ot a test that depends mainly on speed ^ ^ expected 

workers was laj; In^sl'AJrlM '^UdlJ' ‘5°’^'’“' ‘“’’“'o 

Pf loctlon ol vaUdlly for the scree^f^ studies were not undertaken, 
bo attempted. ^ o' semiskilled workers can 


J'“ ” ““ “■”> Fh.ger Dexterity 

uoctHe ot the two lor 001001^^1!,. ‘ P’"'’'"”* *“ 

“oioctlon Into general vocaH„n,i 



THE I-D APTITUDE TESTS 


161 


TABLE 14 


Typical Reliability Estimates of the I-D Marking Test 



G 


Number 

r* 

West Africa 

Primary (girls) 

6-7 years 

44 

.87 

Korea 

Textile workers 

Elementary 

144 

.92 


♦Both coefficients are based on the correlation between the 
separately timed halves of this test. 


applications. The latter is most effective for trades such as tjat of 
electrician, in which precise finger movements are particularly 
important. 

General Description 

Sample items of both tests are illustrated 
Manual Dexterity Test consists of two 

6” X 5" in sice; the examinee is to draw a “ to 

1/8" shaded path, beginning at the arrow, ^d mmutes per 

the numeral 100 as he can within the time bm>t 

part. His score is the value of the X" “ Hex- 

number of times he strays outside the sh L,-* exercises each 

terity test consists of 90 similar but muc s narts with a 

about 5/8" X 1/2’ in sice. It is ‘"“//ore 1^“= num- 

time limit of two " T^rmuTad^^^^^^^^ dme 

her of exercises completed scored by machine, 

lor each test is lUteen minutes; neither can be score 


The major problem in the appropriate blend of care 

these tests was to teach the maximum score. This blend 

and speed that is "^‘t^amlnees. and tor this reason 

is apt to be dUferent for „ U.e examiner in the sample 

S-X ~ rr-UoT^ri^i^TiLb letting him evaluate 
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FIGURE 17 

Sample Items of the I-D Dexterity Tests 



Manual Dexterity rv ^ . 

' Finger Dexterity 


Thus, at the eaU 

telU all those who finished Dexterity Test the examiner 

■ututakes that th^ rut1„d ' “P ‘he many 

Similarly, he tells those who did n t ^ more slowly; 

must work more quickly to ohuinTtlir'^ "“’""“I hO ‘hat they 
ive approximations proved to be an oft method of succes- 

are required. ho had the basic manual abiUUes that 

'Sv?v/rVwM“thTd°^S el the testing pro- 

Bcorlng the papers. Idiy a™ '”?” ‘ technique for 

totT pSaraonr'? “““ was entirely 

make^^ Papers chemically, so thatS'’**’™^' attempt was to 

uuuld theuTelSl!? ‘k^«veusTs‘lde of 

ante the examl^ Munted. But this was i n * “ Paper, which 
truBtworS^ applied when IrS Jfh ‘°° Japanhenl on the pres- 

pato on a?„o *■ ‘’'■“‘‘“I "ae. iSie^wo f ''' auUlciently 

show throug^^"[i'““' ao that only th^ ^Se ^ 

‘“a superimposed 0 ^^™“ ."8°^ 
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Typical Restilts 

The estimates of reliability typically obtained for the two dex- 
terity tests are summarized in Table 15. It will be seen that both 
tests are highly reliable despite the many subjective judgments that 
have to be made in the process of scoring. 

Examinees in the skilled trades were tested only in the African 
studies. Validity estimates for the Manual Dexterity Test against 
grades in shop courses and against foreman evaluations were in the 
range of .35-.40 for avariety of trades (except that of electrician), 
and In the subsequent study at the Malawi Polytechnic a figure of .55 
was obtained. The validity of the Finger Dexterity Test was consis- 
tently lower for all groups except electrical workers, where the 
estimate of .37 far exceeded that of the Manual Dexterity Test. In 
general, figures neat ,40 can be projected. 



CHAPTER 

7 

DEVEt-OPMENT 

OF 

OPERATIONAL 

TESTING 

PROCEDURES 


The design of an operational testing program involves two 
further developmental procedures. The first is to determine which 
of the available tests should be used for this particular application to 
obtain the highest possible accuracy within the practical constraints 
on time and cost that usually limit the total number of tests the appli- 
cants can be given. The second is to formulate an appropriate pro- 
cedure for combining the separate test scores that will be obtained 
into a single index of the applicants' respective potential. Both raise 
Issues and problems that require attention as careful as that paid to 
development of the actual testing procedures. 

This chapter begins with a brief overview of these Issues and 
of the approaches that were adopted for resolving them in the AH) /AIR 
research. Then this developmental procedure will be Illustrated for 
scholastic, technical, and clerical selection programs, drawing on the 
actual applications of the I-D tests for all of the statistical data re- 
quired. 


BASIC DEVELOPMENTAL PROCEDURES 

Once the Individual tests have been developed, the methods used 
in designing an operational testing program are entirely independent 
of cultural variations. The programs that are produced by these 
methods may vary from country to country, as a result of culture-tied 
differences in the tests’ measurement characteristics. But the meth- 
odology itself need not be adapted. 

Thus, the techniques described In this chapter are all standard 
approaches. They are included partly for purposes of continuity, and 
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partly because adequate descriptions of a number of these techniques 
are not available in many developing countries. 


Comparability of the Test Scores 

One basic characteristic of composites of scores is that superior 
performance on one test will compensate for poor performance on 
another when the scores are all added together. If one applicant earns 
ten points more than another on Test A but ten points less on Test B, 
^e two will appear to be equally able when the results on Tests A and 
B are combined into a single index of their potential. That a ten-point 
difference on Test A is equal to a ten-point difference on Test B is the 


is usaaUy false for the raw scores that the 
addin. separate tests. And the practtce of 

mon different examlnaUons, com- 

tion • S. always introduces distor- 

l^unlto^m “”>»ned, they must be converted 

above in fa?t are thTsIme differences such as the 

been develoMd^'rh/^otK scores to a uniform scale have 
"stanlne" approach AID/AIR studies was the 

blem and a simplifi’caUon of ^ soluUon to the scaling pro- 

reporting. It consists of divinf ^If Purposes of processing and 
by the examinees into ninp a range of raw scores earned 

Stanlne score. Each score in same segment to the same 

score of 1; each score in Oip h- ® segment Is converted to a 

each score in the Interveninir segment to a score of 9; and 

between these extremes ^ corresponding integer 

that there are four and one-tain^® straddles the mean, so 

midpoint of the overau dislriblillon'"’''^ 

depends on the degrH to^hlch'lh*^!"'; ““““"t of distorUon 
Aspersion of the ennm“ees --espeet to the 
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If the distribution of raw scores is perfectly normal, the result- 
ing scores will have the properties illustrated in 'f ■ Each 

segment will contain a range of raw scores equal to one haU »£ one 
standard deviation, and the percentages of examinees earmng each o 
the nine stanine scores wiU be as shown. A ooore of Stanine 9 wUl 

indicate performance superior to that of 9G percen o ^ 

inee population; a score of Stanine 7 will superior!^ to 77 

percent of the population; and so on, throughout the entire rang . 

Since the distribution of the scores of a finite 
inees is not apt to be perfectly normal, however, generally 

these figures must be expected. But these f “™iS dta- 
small aL, at least for the I-D tests, close fits ^ toe ‘hc»«tma^dl^ 
tribution have consistently been obtained, even for samples only 
100 examinees. 


figure 18 

Stanine Conversion of a Perfectly 
Normal Distribution 
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Mechanically, the conversion to stanine scores is best accom- 
plished by using ^ the standard deviation values of Figure 18 (i.e., 
giving the examinees who score from .25-.75 c ' s above the mean a 
converted score of Stanine 6), but the percentage values of examinees 
within each segment, as follows: Array the raw scores in order, from 
highest to lowest, and identify the Stanine 5 group by counting 40 per- 
cent down from the top and 40 percent up from the bottom of this dis- 
tribution. If exactly 40 percent is not feasible (because too many exam- 
inees have the same score at the point of division), use a percentage 
somewhat higher or lower, such that the numbers of examinees above 
and below the Stanine 5 interval will be most nearly the same. Then, 
follow the same procedure to identify the scores of Stanine 7 or higher 
counUng 23 percent down and 23 percent 
percentages It necessary to make the two 
tt^rer.nTe.l.r” "'^^e two groups into 

with the the nu:r,l«r ot Stanine 1 scores, the Stanine 8' s 

Se ttae irvL V ‘.n" ’■ " 3' s, at the 

interval? most ^ 

any iurther adiustmentn Respect the results and make 

closer lit to the theoretical * division that will achieve a 

part 01 the e^n'dl/d^MoS^Se 

separately for each anniiro t norms are developed, or 

eiy lor each applicant group after they have been tested. 

to obJmate^mnZ X”'' 

divided by the number oMeste to°otol^“' 

Rather, the distribution ol the summon*^ “''crt'se" stanine score, 
iected to the above procedure soTat “"'f '''> tiub- 

digit stanine results This iinai t ■ converted to single- 

ot the cppUcant's overau Jrt?r™"‘"' “ appropriate index 

given. performance on the various tests he was 


'ect have somewhauitler??' raw'*.'* grouping applicants who 
eel test applications-a nine seldom a drawback In 

decisions that are to be made The^ “‘‘equate lor the 

digit scores are especially attractL??'^ alforded by one- 

,, a they he limited access to hi h *” “ developing country, where 

-"-‘"It -y"“ 


Measured 

programs based on a 
g scores are not Independent. Pi 
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because the abilities that the tests are intended ^ 

to occur together in human performance, an par ^ 

similarities in the testing procedures (such as the correlated 

speed, for example), most pairs of test scores are 
to a certain extent. If this correlation is '"B'', the two testsjiii p^^^^ 
so much overlapping information that the use o using the more 

only slightly more accurate than can be obtained from using 

valid one only. 

For this reason, the intercorrelation of 
an important consideration in the selec on selected, the 

applied. After the one or ^o oufto K'nortSe olie with the 

next most useful test will frequenUy l^us less 

next highest validity, but a somewha . design an efficient 

overlap with the f ”“|^sfj‘,erco’rrelations is almost 

testing program, knowledge of the te 

always essential. 

Many techniques 

correlations among tests into ® uj factor analysis earUer 

been developed. Of these, the ^ j A more recently developed 

mentioned have been the of each of the tests in a 

statistic is a coefficient of the ^ q reduction can be help- 

series that will be used [„una adequate in designing a 

ful, but in the AID/AIR studle examine the raw correlational 

rrfd^s^rr t^e mur«”v: ‘Tpucations that follow this background 
discussion. 


Differential Test Weights 

• ihD validity and overlap of the 
Because of the g testing program is general y 

various test scores, the ^ u,llh aulerent weights, so as to 

Increased by maximally predictive.* 
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Thus, the validity estimates cited in the illustrative applications will 
uniformly be minimum estimates, improvable to some extent by 
appropriate weighting procedures. 


Reliability of a Test Combination 

The overlap among the separate tests that Umits the validity 
available from the composite has exactly the opposite effect on the 
reliability of the overall index obtained; each pair of overlapping tests 
will provide two separate estimates of the ability common to both and 
thereby increase the overali accuracy of its appraisal. Even tesU of 
only moderate reliability can in combination result in a highly reliable 
score. 


The reliability of a composite can be computed by formula from 
the individual test reliabUities and the inter-test correlations. The 
computational procedure is Illustrated In 
incorporates the lowest reliability estimate (that of 
Information Test for last-year secondary school students) obtained 
the West African studies. 

The three tests to be used in combinaUon are the Hljh forin of 
the Reading Comprehension Test (KM- t gl^en 
(SCI), and the test of World InformaUon (WLD). They are g 
to West African students in their last year of secondary school, as 
estimate of further academic potential. 

TO estimate 

:rfhis%‘LS^Uevetaa ~ted in Figure IS. The reliability 


FIGURE 19 
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Nevertheless, there are a number of practical guidelines for 
ihe design and use of criterion measures that proved useful in the 
United States and that were found to be entirely generalizable to re- 
search in a developing country. These general guidelines-applicable 
to the evaluation of individual instruments as well as composites— will 
be summarized first; then the issues peculiar to combinations of tests 
will be considered. 

Criterion Measures 

From the point of view of research design, there are two broad 
categories of criterion measures. The first are those that have 
already been established as standards of performance by the educa- 
tional or training institution, and that must be used in the study to 
provide operationally meaningful evaluations. The second are those 
that are developed specifically for purposes of test validation, and 
that are administered as an integral part of the research procedure. 
The degree of control that the test constructor can exercise is quite 
different in these two situations, and they therefore have different 
design implications. 

Fixed-Criterion Studies 


When the students or trainees are required to pass a certain 
examination or maintain a specified grade-point average as a pre- 
requisite for graduation, any altermUve index of their performance 
is necessarily a less meaningful criterion measure. For, even if It 
can be shown that another index will provide a better indication of 
their performance "later in life," selecting students who have the 
highest ultimate potential but will never realize it because they are 
unable to pass the course clearly is self-defeating. Until the existing 
standards are changed (perhaps as a result of a study that demonstrates 
their poor correlation with later performance), the established criteria 
should be applied.* 

In these situations the most Important guideline the test con- 
structor should follow is to base the selection of tests on a detailed 
examination of the criteria that actually will be applied. The title of 


•The use of additional Indexes, so as to select students who 
wlU pass the course and also perform better later is, of course, an 
improvement. The point of the discussion Is that established standards 
cannot be ignored. 
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tte course and the stated teaching objectives are seldom adequately 
descriptive. The content ol the curriculum, the emphasis of the In- 
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small budget that can be devoted to criterion development and by the 
resulting measure must be accepted unequivocally by the 
o ficials of the institution for which the program is being developed. 
As a result of these practical limitations, the opportunity for innova- 
tion that is afforded by these more flexible situations is seldom 
exploited. Most specially developed criterion measures consist of 
indexes based on grades or ratings or other subjective evaluations. 

The AID/AIR studies did not venture beyond these basic kinds 
of subjective measures. But it was found that certain of these ap- 
proaches were more effective than others, and eventually all validity 
studies were carried out with one of three data collection approaches. 
They provide successively better results, and are successively more 
expensive. 

The first of these was the use of course grades converted to a 
comparable scale . Grades were used only when considerations of 
distance or time made expediency essential, or when a comparison of 
this type was specifically requested. Depending on the curriculum, 
grades in four to seven separate courses were obtained for each of 
the examinees, and then combined into a single index of overall class- 
room performance. 

When the grades were expressed in the form of percentages or 
similar units, they were converted to stanlne scores before they were 
added, for the reasons earlier noted. When the stanine approach was 
impractical because the grades were already grouped into an even 
smaller number of categories (e.g., letter grades from A through E), 
an equivalent procedure was used. 

Assume that the classroom performance of the individual students 
is normally distributed, and that each grouping of grades therefore 
represents a segment of an underlying normal distribution. On the 
basis of this assumption, the position of each segment along the normal 
curve can then be determined by consulting a table to the normal pro- 
bability distribution. 

In Figure 20, for example, the tabulation at the left shows the 
distribution of grades that a group of 200 examinees obtained In one 
of their courses. By consulting a table of the normal curve, it can be 
determined that the 10 percent who obtained A' s represent an Interval 
beginning 1.28 standard deviations above the mean, because this is the 
Interval that contains the top ten percent of a normal distribution. 
Similarly, it can be determined that the Interval that contains the next 
highest 30* percent begins at 0.25, and bo on, lor each of the five letter 
grades. A pictorial representation Is shown at the right. 
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HGURE 20 

Normalizing a Distribution of Course Grades 
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assumpUon that the difference between an A and a B is eqial to the 
difference between a C and a D, within the same course 
different courses. In the illustraUve 

sented, the above approach was used for all validity studies 
these types of criterion measures. 

A second and generally more satisfactory oMhe^re- 

teacher or supervisor evaluations especially usually 
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also given a ranking form which consists slmpiy 
blank spaces, as Illustrated In Figure 21. 

figure 21 

Form for Alternale Ranking Procedure 
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At each point, "most able" or 'least able" Is the only Judgment required, 
and this is one of the simpIlficaUons afforded by this procedure. A 
second is that the evaluaUon of the individuals between the extremes, 
which is usually the most difficult task, is deferred until the list has 
been reduced to more manageable proportions. 
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date collector conducts each of these sessions, a degree of compara- 
bility not available from a general explanation of the procedure above 
is provided. 

Because of the demands of this approach, it was used only 
occasionally, but uniformly excellent results were obtained. 

Design of the Validity Study 

In either the fixed or the optional criterion situations, the optimal 
design of the validity study is as follows; 

1. To administer the experimental tests to the group of applicants 
from which the students or trainees will be selected; 

2. To ensure that the test results are not used as a basis for 
these selection decisions; and 

3. To collect the criterion data in a follow-up study, after suf- 
ficient time has elapsed to assess the relative proficiency of the indi- 
viduals who were selected. 

From the correlation between the test scores and demonstrated pro- 
ficiency, the improvements that would have resulted from the use of 
the tests can then be estimated with reasonable precision. Had the 
tests been used as part of the actual selection process, only applicants 
above a certain cutoff score would have been admitted; and the differ- 
ence between the proficiency of students at these scores and the pro- 
ficiency of the students in fact selected is the magnitude of the improve- 
ment that would probably have been effected. Whenever possible, this 
is the design that should be applied to validity studies. 

In many situations, however, waiting for the results of a follow- 
up study is unrealistic. This is generally the case In the validity 
studies of individual tests carried out during the initial development 
of these procedures, since it is quite Impractical to wait one year or 
longer between successive revisions. This Is also the case In any 
Subsequent application in which the resxilts are to be used for an 
immediate decision, as In a feasibility study to assess the payoff of 
a proposal already pending. In these situations the estimate must be 
based on a concurrent validity study, in which the tests are given to 
a sample of individuals already In the course or the Job, so that crite- 
rion data on the examinees’ demonstrated proficiency can be collected 
at once. 



180 


ABIUTY TESTING IN DEVELOPING COUNTRIES 


This approach is also indicated when the test constructor is 
prepared to wait for follow-up data but the institution that will apply 
the tests is not. When there are strong pressures for Immediate 
improvements, many organiaations wIU want from the first to use the 
scores for selection decisions; this spoils the design because no 
applicants with the lower test scores will be included in the follow-up 
poup. Some estimates can be made, but it is usually safer to rely 
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for a single test, since it is in fact only the single Index of overall 
potential obtained after the scores have been combined that is com- 
pared with the criterion measure. But when the probable validity is 
to be estimated in advance, on the basis of the validities typically 
obtained from the individual tests, a somewhat more complex aporoach 
IS required. 


The approach Uiat is used is highly similar to that earlier de- 
scribed for estimating the overall reliability of a test combination. It 
IS illustrated below using the same three tests that were used in this 
earlier example. The first step again is to prepare a matrix, as 
illustrated in Figure 22. This time it is the validity rather than the 
reliability coefficients that are inserted in the diagonal cells. 

The second step is to add the validity coefficients, exclusive of 
the intertest correlations. A sum of 1.16 is obtained. 

The third step is to total the entire matrix, using the value I.OO 
in place of each validity coefficient. The result of 5.38 is the same 
as that obtained from the corresponding step in the reliability example. 
But in the validity computation, it is necessary to use the square root 
of this sum rather than the total itself. A figure of 2.32 is obtained. 

The last step is to divide the sum of the validities (1.16) by this 
square root. The quotient of .50 is the estimated validity of the com- 
posite. 


In this illustration, the gain in validity attributable to the use 
of the two information tests in addition to the Reading Comprehension 
Test is negligible, as a result of the hiirly high intertest correlations. 
The reliability of the overall score will be Improved however, as 
earlier noted. 


FIGURE 22 
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Converting Validity Coefficients to 
Operational Measures 


There are many ways in which a validity coeflicient can be 
translated into a more meaningful operational measure. One common 
approach is to express the improvement in terms of the criterion 
measure that was applied, such as the increase that can be expected 
in the average course grades the students will earn when the tests are 
used for selection, A second is to compute the increase In the pro- 
babiU^ that the average entrant will attain or exceed a fixed minimum 
proficiency level. There are numerous other related approaches. 
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TABLE 16 

Conversion of Validity Coefficients Into 
Operational Improvements 


Percentage of Selectees Likely to 
Become More Proficient Than 

Minimum Requirement Present Average Trainee 


for Admission 

r=.45 




r=.65 

Present requirements only 

50 

50 

50 

50 

50 

Present requirement and 
Stanine 6 

68 

70 

72 

74 

77 

Present requirement and 
Stanine 7 

74 

77 

80 

83 

85 

Present requirement and 
Stanine 8 

80 

83 

86 

89 

92 

Present requirement and 
Stanine 9 

66 

89 

92 

94 

96 


Secondary School Admissim Procedures 

For school admission decisions, scores on scholastic aptitude 
tests should normally be used in conjunction with two other measures. 
The first is an index of the applicants' prior academic performance, 
as shown by tests of achievement In the core primary school courses. 
The second is an assessment of their persona] characteristics, made 
on the basis of individual interviews the instructional staff of the 
institution for which they are being selected. Applicants who have not 
the necessary academic preparation or who exhibit pronounced p€rson 2 l 
problems are unlikely to be successful, irrespective of their scholastic 
potential. 

The Ilrsl step in designing the admission procedure, therefore, 
is to develop an overall plan for the application of these three separate 
measures, hi practice, this almost always entails the dei’elopment of 
a sequential process in which the various measures serve as succes- 
sive hurdles, so that they can be applied to successively smaller 
numbers. Because of the typically large ratio of applicants to selectees 
(as high as 130 to 1 In some of the African studies), evaluating every 
applicant on all of the measures is, if not totally Impossible, at least 
grossly inefficient. 
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TM considerauons perUaaal to toe design ol such a sequenUal 
process at the postprimary level are toe following: 

1 The order ol toe sequence should he from 

toanlitude test to Interview, in accordance with toe dosts 

these measures and their requlremento for specially trained pers 

2 As many of the appUcants as can tie accomodated at the apti- 
tude tcsung stage should be permitted to pass toe achievement test 
hurdle, so as to take account of the Umltations of achievement tests 
at this level, as described ia Chapter 1, 


3. The use o! the interview should be limited to the rejection 
of applicants who appear to be obviously unlit, because this type of 
assessment is generally a poor basis for the identification of appli- 
cants with outstanding potential. 


In addition, these three conditions will generally have to be met, as 
well as Is practicable, within this further constraint: 

4. Two separate testing sessions are the most that can be 
Justified, on administrative and logistic grounds, lor a secondary 
school admission procedure. 


Asking the applicants to appear for testing on three separate occasions 
is usually too time-consuming and costly, particularly when consider- 
able travel would be required. 


The Three-Stage Selection Procedure 

In practice, these conditions mean that a three-stage process is 
feasible only when the first stage has already been carried out as 
part of the regular primary school cycle—i.e., when a national achieve- 
ment examination is administered at the end of the primary course as 
the basis for certUying successful completion. In this case, the scores 
of those who decide to apply for the secondary schools can be obtained 
from the records without extra testing, and can be used as the first 

sessions, one for the aptitude tests 
and one for the interviews, need be conducted. 

of the total appUcant pool that is invited on the 
lamination to appear for aptitude 

duties thfrHST 

toree to five time^tK of view, it is appropriate to invite 

tores to five Umes toe evenlnally to be admitted; etnee this 
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The Two-Stage Selection Procedure 
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and In the interviewing of 

tat will tend to prevent either deficiency from / 

“opcrUons. It is the nppronch that wan generally ^ 

AIR studies when exit examinations were not used at the primary 


Selecting the Aptitude Tests 

Once the overall plan has been developed, the apUtude te^ that 
best fit the technical and practical requirements can be selected. In 
the case ot the I-D series, there are three basic tests related to ^r- 
iormance in secondary school academic courses, as will be recalle 
from the descripUons in Chapter 6. These are the Verbal Analogies 
(Low), Reading Comprehension (Low), and Arithmetic Tests. Their 
essential statistical properties are summarized In Table 17, based 
on the West Alrican data. 

If a three-stage selection procedure is to be used, all three of 
these testa can usually be applied at the second stage. The total 
administration time is approximately one-and-a-haU hours for the 
entire set, so that each examiner can conveniently test two hundred 
or more students per day. The reliability of the series is approxi- 
mately .90, and the expected validity .58, when all three tests are 
equally weighted. 


In administering the tests, the Reading Comprehension Test 
should be given first, since it is a convenient vehicle for teaching the 
marking procedure, and Involves no other operations unfamiliar to 
the examinees. The Verbal Analogies Test is second; and the Arith- 
meUc Test, because of its requirement of speed, is the last. Special 
attention must be given to the teaching of the speed concept in this 
particular test series, because it includes no tests of intermediate 
speed to prepare the examinees tor Uie rapidity that the Arithmetic, 
Test requires. 


wf Is to be accomplished in only two stages, the most 
T?.? “>6 Reading Comprehension 

V At" “'Mevement testa lor the Initial screen- 

Analogies and Arithmetic Tests when 
test^fr. flr.rA -here than a single apUtude 

aeMrAte^A If,?" h“°ent of time the 

tests as weU as nw generally should include essay 

And the Reading tor the reasons earlier noted. 

admlnlstratlo^^?°l'^^?^®^^°*^'^®®^ logical candidate for 
by untrained examiners throughout the country. 
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table 17 

The 1 -D Postprto .7 Scholastic Aptitude Tests- 



etc., may be found in the I-d 


Tectaj^^^u-lterlon instruments, 


For secondary schools 


aptitude tests should be addea ^clcal htii 
choice Is the Mechanical laior^f^ taslc sen technical 

most valid of these procedure^^htesi te„ appropriate 

examiner training. This »m is^™'>ecaos 5 ,, tt Is generally the 
approximaUly two hours, but ir'**** tte as virtually no 

performance can be expected. '*evec^«MnislraUon time to 

^ egalnst overall school 

In general, then, incluaie 
secondary school selection pros ^ ttiree 

ties of .55-.60 or more, apart be ei^^epUtude tests in the 

admltung only applicants at^^ “'heel i’ ’WicaJ^^ qualiXicationS 
entirely feasible in many locatto,^>”e J „'*> M 

'’'"Staetoe 9 levels » 
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nCURE 23 

Validity of the I-D Secondary School 
Admission Tests 


Minimum Kequirements 
for Admission 

Percentage Selected Likely to Perform 
Above Level of Present Averatro student 

Present standards only 

“ 


-I • 

1 

Present standard and 



f'D Stanine 6 

r".74^ , • ""’TT^ 

"T 1 

Present standards and 



1-D Stanine 7 

i 82% ••'•■'• 

~1 1 

Present standard and 



I'D Stanine 8 

[_-.88% 

, 1 

Present standard and 

I-D Stanine 9 

P • ■04^ 



] 1 


M aptitude 

=L^;-c.r:-s.:5HKH- 

U>*-1 arts), a 

unknown If all^ff each candWaT^^^ should be applied 
“elcnt 01 87 and a ''^ =>'e used *U> eventually S ’ 

But the high degree'oT^®^^ validity coef- 

®WBgest3 that using both'^Vlk*’ ***"'®en the Granh« expected 

technical grounds ° ‘"strument/f^ Arithmetic Test; 
of Its lower Interim t Arithmetic Test l«? c worthwhile On 

acceotab,,? "°"®^"tions; ^t tt "“P^rior. because 

-•■.■at=:x^~SsS 
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The needs and opportunities for the development of a multipurpose 
Mother (actor, therefore, that should be specUically 
ev^uated in the survey. It there is such a need, and if the project 

CM ll pTojectd.^'”"' “> “ 


include thrioUTOinp'^^ categories of payoff that have been suggested 


made i; o^ctagTet oS“^: ' improvements that could be 


prolessioMUcfttag°Ms^rci*ta “intcnce of a 

Other societal change; and stimulating educational reform or 


center envolves mtoVmuUiMr^^ that would be made if the testing 
^ multipurpose behavioral research institution. 

D0nan>4iM.v u._ . . 


usuaUy <iepeiii\J'a"malSl'*of (wMrli, to turn, 

leaslbuity study ol one weS't^on ' "®'®®s^ry Investment), a 

to assemble adequate taIormaUmL“\k"“' sufficient 

sPshiry. ormauon on these possibilities to a developing 


the likelihood of success 

nde“nc'^“Mh“ would actuall; 

seem'S 1 “rerlooked, ceruin^o important factors 

Memtagly loolproo, arrangement" snd certain 

ranteem “ “ “'''elopment ProjeL"S^ somehow to go astray. 

» mere are no advance gua- 

ssn be 

survey -nie oh- And this sho outcome in the light 

^ the AlD/AfH 

feasibility questions that 
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were found to be particularly important. It can offer also a rule of 
thumb for evaluating the answers obtained. If the present situation 
must change in order to achieve the desired payoff, the iindwg on this 
aspect is a negative indication that reduces the project's potential. 

If the payoff will be achieved unless the present situation unexpectedly 
changes, the finding is a plus that confirms the above projection. And, 
although there is no mathematical procedure for adding up these 
pluses and minuses, such additional feasibility data will naturally 
permit a more realistic decision. 

Three of the questions pertain to the feasibility of actually 
developing and putting into effect the specific testing services that 
have been projected, and should be applied to each of the potential 
test applications. The other four are addressed to the even more 
basic issue of the likelihood of success in developing a viable profes- 
sional institution in this country at this particular time, and under the 
specific arrangements proposed. 


Feasibility of the Proposed 
Test Applications 

The focus at this stage of the survey is not on the technical 
feasibility of the proposed applications, since these issues were con- 
sidered in compiling the list and all unrealistic suggestions were at 
that point deleted. Rather, it is on the political and pragmatic factors 
that determine whether or not a technically sound idea is likely to be 
implemented in the actual operational setting. From this point of 
view, the following three questions are especially important: 

1. Who is it that regards this problem as a serious *7iurt” for 
which a remedy Is urgeotlv needed ? In compiling the list of testing 
needs, a wide variety of sources was no doubt consulted. The relative 
credibility of these sources was presumably checked as part of the 
survey. Now these sources should be reviewed again, from a more 
pragmatic perspective. Given the political realities in this country, 
how much leverage can each of these sources exert in enlisting the 
needed cooperation to do the research, and in ensuring that the result- 
ing procedures will in fact be adopted? The greater the leverage of 
the prospective client, the higher the chance of operational Implemen- 
tation. 

Lowest on this criterion is any test application that the survey 
team itself suggested on the basts of its own observations. No matter 
how serious this need may be in the abstract, it Is not until it Is 
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perceived as such by the responsible officials that the support neces- 
sary to solve it can realistically be expected. The present situation 
has to be changed by selling the importance of the problem to the 
people in charge, and success in effecting this change should not be 
taken for granted. Applications proposed by the team itself should be 
treated as long-shot payoff projections. 


Somewhat more promising are the needs identified by the pro- 
fessional community, such as university staff, since these individuals 
can be effective partners in gaining support at the decision-making 
level. But one should not overestimate the influence of the professional 
to a developing country. Unless he happens to be in an especiaUy 
favored position, he may actually have greater difficulty in getting a 
**>“ u-e Visitor from abroad. On balance, 
a at tie professional level should be 

regarded as mUdly negative on this first feasibility factor. 

responsible olilcials themselves 
these official.^"* °* expectation, especially when 
Houlr.d ^ ^ Ptovlde the sopporl tLt w U Z 

"sCrelorml'f“°" fltaaever the enUiuriLTfor 

top official, since ther?^ Ittnited to one 

in many developing countries 

suggests an optimistic anotahsal i * taslance the above rule 

tavorabie m.if. and'^^iler^^e'^^iSS.rcX^' 

’’nn*'™ the organixattan mSt'OTlvr mreso^'^r represent 
of the Incumbents involved Thi» mvi* *^®®P®otive of the personalities 
with a rise to the candidl?es who ^ faced 

30,000 to 70,000 students Is one sucT»ii“ “ “““'“^tlon from 
virtually assured. It Is dUlicult lo support is 

that would eUmlnate the afedX^Se “S 

^comtlng the Slgmiud^ ot^StaSt"" 

For the availability of exceUem Projected. 

Is no eager or even willing consumer ** 
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Ministry of Education. And unless other arrangements are made the 
proposed professional unit would naturally be established as part'of 
mat organization. A second important feasibility Issue is the extent 
to Which It will be possible to cross bureaucratic lines of auth“i‘fy 
Md actually implement those test applications that are not pari of 
this same organizational structure. 


The possibilities in this regard appear to vary greatly from 
country to country. In some countries it is entirely feasible to include 
even military testing within the planned scope of an education-based 
testing center, in others, programs sponsored by the Ministry of 
Education ipso facto rule out the possibility of university participation, 
^d vice versa. Each of the proposed applications should be checked 
from this purely bureaucratic point of view, and confirmed or discounted 
in accordance with local conditions. 

The best guide to this assessment, of course, Is the history of 
interactions among the organizations concerned. If, as in one country, 
the ministry and the university share staff, so that many of the senior 
personnel hold dual positions, services for both organizations can 
realistically be projected. If, as in many other countries, these organ- 
izations have set up separate units to carry out parallel functions, 
such a projection would be unduly optimistic, for, even though these 
cooperative relationships may subsequently develop— and, in fact, 
did in the case of two of the projects— this Is not be most probable 
expectation. 

The greatest utility of this appraisal is in renegotiating the 
proposed project arrangements. Often it can be shown that the project 
will be a viable one only if it is carried out as a multiagency effort; 
sometimes this demonstration will prove persuasive. Agreement 
may be obtained to establish the center under the supervision of a 
governing board representative of the various organizations; this is 
a generally effective vehicle for ensuring a broad scope of services, 
as will be further discussed in the following chapter. 

3. To what extent does the logistic infrastructure necessary to 
implement each of the proposed applications already exist, and to what 
extent must it be created ? Carrying out a large testing program, 
especially on a nationwide level, requires a wide range of administra- 
tive mechanisms and arrangements. Because the employment of full- 
time personnel to conduct a few testing sessions per year is totally 
unrealistic, a large corps of field supervisors and proctors available 
on an as-needed basis must be recruited and trained. Because there 
will generally be only a limited number of facilities throughout the 
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country that can accommodate large testing sessions, foolproof pro- 
visions lof the use of these facUUlea must be made. Because the 
security ol the tests Is always a major concern, tightly controlled 
transmittal and distribution procedures must be developed. The 
startlng-up cosU of a new testing program, and the time that these 
arrangements require, can be substantial. 


Accordingly, each o! the proposed applications should also be 
checked from the point of view of lopsUc provisions. If the task is 
simply to introduce new Instruments and streamlined procedures into 
an existing testing program that has been carried out for some years, 
fairly rapid progress and early payoff can be expected. If the task 
Is to greatly expand or totally reengineer an established program, 
progress Is likely to be much slower. If the task is to introduce testing 
Into a situation in which testing has not been the practice before, a 
considerable lag to actual payoff should be projected. The separate 
applications that have been listed can vary widely in this regard. 


The main utility of this further appraisal is that it adds a time 
perspective to the payoff projections. And this should be useful both 
for the feasibility evaluation and the subsequent scheduling of the 
activities to be undertaken. 


Viability of the Proposed Institution 

The other four questions pertain to the likelihood that the insti- 
tution to be developed— as distinct from the services it will provide— 
will be successful. The Institutions to which AID/AIR assistance has 
been provided have ranged from dynamic establishments that have 
Increased Ihelr scope and professional reputation to perenlally strug- 
gling establishments that have survived only In the technical sense of 
not emptying the building and locking the doors. And it is clearly 
TJ w ^ advance the status of the situational factors 

Uul laclUUlc or obslnicl the development ol a viable insUtutlon. 

th. speclly all oi these [actors was, of course, far beyond 

'Mlors raised in the [oUowlng 

ta Sut?,^rs to expialh the differences 

in toslllollooal development that were observed: 

TV. . hroDced InslIhJllon's dlrerlor ? 

charUr^ttc dtrccinr'^i* a competent and preferably 

.arumatic director. Usually. U is during the Orst two years of an 
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institution-building effort that its eventual viability is effectively 
predetermined; the crucial factor in this early stage has consistently 
seemed to be the quality of the director. He has the most significant 
direct effect on the program because many of the crucial steps are 
ones that he personally must accomplish, and a perhaps equally impor- 
tant indirect effect in that outside agencies and officials respond to 
an organization that has not yet developed its own "institutional 
character" mainly on the basis of who is in charge. 


If one of the country’s top professionals will be assiped, and 
if he is prepared to give the project enough of his time, fairly rapid 
progress can be expected. If an expatriate specialist with equiva en 
qualifications must assume these functions while the local specialists 
are being trained, the results can be equally effective, but a lower 
expectancy should be projected at the time of the survey. If it is 
necessary to begin with a director (local or expatna^) 
stature, the outcome should be judged problematic. C^arly, tom m 
another aspect ot the proposed arrangements on which renegotiation 
sometimes must be attempted. 

2. is the prop o sed Institution likely to attract 
orcfesslcnal talent? That there wiU be 

staff the project when it begins cannot be expecte , a funds 

sary for the program's success. So long as adequa ohnrtafre 

are'iludgeted-and this Is, of course, t ueSr- 

of trained specialists need not in itself be regarded as a major 
rent. 

But there are two other aspects of the 
that should be checked explicitly m the survey, P , 

of these can reduce the likelihood of success irrespective cf the mag 
nitude of the training provisions. 

The first is the availability ‘f rsi^mlem pLl 

special programs that will be established. , . . fields (e.g., 
of individuals with undergraduate degrees ^ duties to begin a 

education) who can be released from their Mgher 

career in testing? Are 

qualifications (e.g., a master s P y not, the 

who can enter advanced measuremen exceptionally long 

country must be prepared either to ^vest to ^ ^" it is not yet 
period of institutional development, or to admit that it is n y 
ready for a major effort to testing, 

m coilecllng information on dicse ^Ints, it is ^^Vs°oTmc“^ 
with statistics on high-level manpower, but with the na 
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actual Individuals who are being coosUered, and to Interview a sample 
ol these individuals as part ol the survey. Using this approach. It 
was Lad to one count.!, ih,i (here was but a single Individual with 
aTadvanced degree at all relaled to testing, and that even he could 
not be spared Iron, his present administrative position; In another 
country the seemingly large pool ol available Ints 

mainly ol rejects ol other training programs, ol professional students 
who had been accumulating degrees In a wide range ol courses, and 
oi individuals who {or one reason or another could not leave the coun- 
try lor advanced education. Such more precise Inlormallon Is clearly 
essential lor a reallsUc assessment. 


The second relaled consideration Is the degree to which the 
proposed Institution will be able to compete lor the available talent 
with the many other organlaatlons that would draw on the same re- 
sources. From the point ol view of the prospective trainee, entering 
an essentially new field of as yet unknown status in his country entails 
a much higher risk than pursuing one of the well-established careers; 
and l! the conditions of service are not up to par, he Is likely to turn 
down the opportunity lor training, or take it and then look for a better 
)ob. What the proposed institution will be able to offer In terms of 
sabrics, fringe benefits, upward mobility, statutory protection, and 
all o! the other conditions that are spelled out precisely (or iho stan- 
dard public service positions should be checked as part of the survey, 
to ensure that a career in testing wilt be competitive with the candi- 
dates' other options. If not, It may prove impossible to meet the pro- 
jected training quotas with the caliber of people desired. 


The fact that both this and the preceding question relate to 
personnel issues underscores the pivotal role of this factor In the 
development of a professional institution. A project that cannot 
assemble the requisite skills will fall, irrespective of the amount of 
money or equipment or professional assistance that may be provided. 


testing se rvices that arc to be implemented Immcdi- 
aAgl Z mcludg least one thnt l.v a permangnt and hlehlv vlslblG onera- 
11^? Because institution-building is a lengthy process, another of 
the irasic requirements for success Is survival during the develop- 
mentai phases. Before the institution can become truly viable and scU- 
unanticipated events are likely to threaten the con- 
'*'hich it is dependent, and there should be adequate 
ih«. r.f ^ weather these storms. A change In government, 

arvd the rfn\-ir * * revision of external assistance policy, 

he Am?xm ^ examples of the 

storms the AIB/AUt projects encountered. 
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The most adequate safeguard for institutional survival— recog- 
nizing that none can be absolute— is a charter that includes at least 
one of the testing services that in this particular country xs judged 
essential. Normally, such ’’essential” services consist of only three 
applications: the certification examinations of the secontory school 
level, the entrance examination to the academic curriculum at toe 
secondary school level, and toe university admission ^J^at 

order of importance. AU other applications, ''“"f theyf 

contributed to the above payoff projections, are of ^elat v y 
significance against the present 
ofTleast one of the essential three m the 

should therefore be given considerable weight (and perhaps reneg 
tlated) as a central factor in the likelihood of success. 

One drawback to the inclusion 

certification testing is that ^ ^^e staff time they 

methodologies ^ the more innovative types 

will consume will necessarily detrac concluded that the 

of test applications But, " 4 “ 
increased prospects of continuity mane uiis 

"home" for a testing activity ^ve alre J bureaucratic lines 
that It afford maximum opportunity for c “®^j.^uations. And 
of authority to provide testmg , b^b, top^ofessional 

the second is that it be able to attract ^d reto top 
talent. For long-range viability, both a 

if the proposed charter is such " ““ ““ 

first be established withm a ^ not be further considered, 

criteria, this final t^asibilUy oueslmn need_not^^^ ^ 

But such charters may well be ““ “ ul structure from these 

projects began with an optimum case in future efforts 

potats of view, and the a modest scope and 

Serg^rratLprhmrars moimt 

feasibility study. 

Thus, the indicated aPPCoach^^U ute^atora S.‘d '“the" 
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The number of specialists needed per year should be determined 
by developing a detailed staffing pattern and training schedule, in ac- 
cordance with the planned sc£^ and local conditions. In general, two 
to four outside specialists Is an appropriate number during the initial 
years, depending on the size of the proposed institution. 


Professional Training 

A second major cost item is the provision of formal training for 
the professional stall. Normally, this training will have to be carried 
“qSred " advanced measurement programs 

le be 

a combined work-sludrproeram thaff 1**0“ center, or from 
demlc components. Even ''sSrf^li^ dPPUed and aca- 

those ottered pcrtodlcaUy to Prtncrt!^^?/"* «“='> as 

and in Lagos by the West Afriran f Educational Testing Services 
tremely helpfuL Examinations Council, can be ex- 

At the same time, however th<. ; ^ 

^ee as a status symbol also must be advanced de- 

“ possible to attract able ^ countries 

net e^ Mlh the conlerntl of a dipIoM^Tn’ ° ^ does 

*!flu ° '“““'‘and the prolLsloiii'^ ^vanced degree 

»™r„e~Soi« 
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Facilities and Eijulpment 

A testing center, unlike many other types of institutions, does 
not require a specialized physical plant. The centers associated with 
the AID/AIR research have been sited in residential dwellings, on a 
college campus, in an ultramodern skyscraper, and in other structures 
with entirely adequate working conditions. Special construction costs 
need not be projected. 

The most important type of specialized equipment that is re- 
quired is adequate duplicating machinery for producing test forms 
with a minimum of delay. Although the final test forms would normally 
be printed at a professional printing house, the need for rapid reproduc- 
tion of trial tests (and perhaps numerous revisions) makes an ade- 
quate in-house dupKcation feasibility essential. Basic multillfh equip- 
ment or other processes that can accommodate drawings as well as 
text should be provided from the time the project begins, and the ad- 
dition of more elaborate units later should be projected. 

Next in importance is high-speed data processing equipment, 
Including a document reader for test scoTij\g and a medium-capacity 
computer. The procurement of such equipment can be deferred until 
the scope of activities justify the expense but, in light of the long 
delivery lags, should be planned well in advance. The availability of 
a computer has many advantages in addition to data processing (e.g., 
in stimulating research), and methods for obtaining one at the earliest 
possible time, such as cost-sharing arrangements or the provision 
of services to other organizations, should be explored. Because it 
is generally advisable to rent rather than purchase this type of equip- 
ment, its cost will eventually have to be subsumed under the center’s 
recurrent expenses, but it can be considered a reasonable capital 
investment throughout the developmental stages. 

Another cost to be projected Is that of stocking an adequate 
professional library. Including if possible microfilm readers and 
records as well as books. Among the recurrent expenses, subscrip- 
tions to the major measurement journals also should be Included, 

Overall, the hardware costs of a testing facility are relatively 
modest, and procurement can be phased over a fairly long period of 
time. As earlier noted, it Is the professional skills that are the 
most costly and most crucial component. 
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Cost Reduction 

One of the ways in which the establishment of a centralized testing 
facility can help to reduce the costs of separately administered testing 
programs was described in Chapter 2. This is by combining the testing 
operations of a number of organizations that have similar needs, and 
thereby lowering both the costs of lest development and the recurrent 
^enses. A second important saving Is In reducing the huge sums 
that many developli^ countries now must spend to maintain the se- 
curl^ of their tests before they are given. Once the Institution has 
developed the necessary logistic infrastructure and is capable of 
pricing altemale forma of the major tests, many of the current 
costly precautions can be eased, and a sizable saving effected. 

can ^ '”““P“rPose testing center is that it 

S toX Uiat do not subsidize its oper- 

otlTetlies Lot,': ■'Pip “P'PPy 

West African BamlnaHonr'r ^ The earnings of the 
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checklist of feasibility questions 
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4. In light ol the infrastructure and operationai mechanisms to 
be deveioped, what is a reasonable time frame to attach to the above 
payoff projections? 

5. WiU the proposed institution have from the first the services 
of a capable and charismatic project director? 

6. Will the proposed Institution be able to attract and retain 
the professional talent required? 


7. Does the institution's initial charter include at lewt one 
"essential" program that would help to ensure its survival . 


8. Will the institution's Initial organizational ^ j ? 

ciently flexible to permit change U these linkages prov 


9. Using the qualification gap between the 
positions and the available candidates l„d^ 

how many years of outside specialist services will be requirea 


10. What mtx of formal "/’'^’'’^/heLundanrcosts'* 

short courses should be projected? What are the a 

and the time required? 


11. To what extent will the institution outside 

ating costs through earned income for service p 
organizations ? 


A number of these issues wiU be 

chapter on organization and operating procedures. 
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OPERATING 

PROCEDURES 


Once it has been decided to proceed with an institutional de- 
velopment program, many additional highly specific decisions have 
to be made. The general structure and scope projected during the 
feasibility study must be elaborated In full detail; the exact mechanics 
of implementation must be developed; both short- and long-range 
activity targets have to be set. And even though many of the decisions 
are likely to be changed as the institution develops, (heir influence 
throughout the initial phase— when the entire concept may be on trial- 
makes careful planning at this beginning stage especially important. 

This chapter describes the different approaches to program 
implementation that were followed at the various project locations; 
where possible, it relates these to subsequent accomplishments and 
limitations. But broad generalizations cannot be attempted, for none 
of the topics discussed in this' handbook is so highly dependent on local 
conditions as the specific program mechanics, and there may well 
be as many "ideal" approaches as tJjere are developing countries. 

The objective in this concluding chapter is simply to present a broad 
range of alternatives for local consideration. 


LEGAL STATUS 

The decision that is made about the organizational locus of the 
testing establishment can have far-reaching consequences, as de- 
scribed in the preceding chapter. Careful consideration should be 
given first to the selection of the agency or agencies through which 
the center will be legally incorporated within the government struc- 
ture, and then to the exact nature of its interconnections with the 
agency selected. Whether this is to be a permanent or interim ar- 
rangement also should be decided. 
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No two ol the testing establishments that were created or ex- 
panded as part of the AID/AIR projects are identical in organizational 
structure. But they can be grouped into four general types of insti- 
tutions that may be regarded as fundamentally different models for 
institution-building. The AID/AIR experience with each type will be 
reviewed from the points of view of (1) ease of establishment, (2) re- 
cruitment of staff, (31 research opportunities, (4) practical applications, 
and (51 overall evaluation. 


The Testing Division of 
a Government Agency or Department 

unit is to estabUsh the testing 

central eoverame't’’ f departments of the 

UheSt ft. ^ a • both the 

niatoM ftr *" With the National 
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Ease of Estahllshm.nt 

spectal PcSl^J^^j “'^“^^'''nent of the activity posed no 
steps required to execute a blnalinn.! ^ attributable to the many 
totracotntry arrange^^ll agreement, but the 
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ean be created at the discretion ol ts’ unit typical] 

approval. "eency head without outsit 


The professional staff fn »f.i 
S“tt' «rtl serrtce, su“ec^ T "dcessarily part of I 

also has the cIlSh ®°”™”ent Is by far the"!^* ibe norm in cou: 
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Which of these has the greater impact on the opportunities for 
attracting and retaining staff may depend on the size of the estabUsh- 
ment to which the center belongs. In Thailand, the huge reservoir of 
professional talent that is part of the Teacher Traimng Department 
could be (and was) tapped to staff the center-a number of career 
civil servants with special Interests in this field were '^cntified and 
simply assigned to these positions. Recruitment and «‘cntlon bo h 
were faciUtSed by the center's status as an 
But in Liberia, the three-man Student 

not comparable in-house resources, and in ivi u for 

already part of the civil service had to 

these positions. And, here, the inability of the ‘^“ter t^o offer erfra 
reward for the extra training required posed ““““f 

that never were fully surmounted. Candidates could (‘f ^ 
positions in other agencies that offered the 
Objecting them to the rigors of advanced measurement course 

On balance, civil ^^^vice positio^ are superior 

less stable organizatiims, ^ But they may be less attractive 

the Nigerian Aptitude Testing devoted exclu- 

than positions in permanent A,,. nexibiUty to reflect 

sively or primarily to testing, and that have the flexibimy 
this orientatton in the conditions of service. 

Research Opportunities 

Both centers prov^to ^ ^^^"^"^"^“nVomp^S Part of 
both an Impressive number of =1“'*'®® easier access to large 
the reason, perhaps, is that operates the educational 

samples of tryout groups than t e S readily arranged by a 

are required. 

The main Umitation on the --"rt^'IhlfsU-nhVs^^^^^^ 
part of an operating branch e“’eroment ^ th 
time that is available for “ P" ” example, the main 

operational agency functions, to Thailarri, 1 „ 

tMk of the Teacher Training nii fsts to the center without 

Sd not release its 

rn?'ri!b‘r .o ^ 

departmental duties. 
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But these were not serious problems— and, at a certain stage 
in a testing center's development, may not be problems at all. (In 
West Africa, full-time professional staff are currently being encour- 
aged to take on extracurricular teaching positions, as an apparently 
useful self-development measure.) A center situated within a govern- 
ment agency should experience no dUflcultles in carrying out the 
needed research. 


Practical Applications 

thB ot tUs modGl, as noted In 

with a ^ P”®***'® limitation that an alflUatlon 

and™S„^SMW?p“pSon'!“'"'"' 

Department ct Edu- 

conducted The center 

school levels, develme^?' w “ t' ‘f® “bIi 

level, administered emiOTm?nt'i .*?‘° “ip university 

ments, and provided tcsthf/It? Bovernment depart- 

eommerclal organlzaUons ^ vile Industrial and 

tsaslble In a cwnlry the sir^rfS “PPUcallon 

the center's progr™ “l P'sh subsumed within 

dude that no aSmallve diarter^’ reasonable to con- 

cr scope lor a Uberlan TesH^ Ser ™ freedom 

partly because the IntJcSiJ’cllm nf"®' services was not developed, 

0 the Initial project oh Icltvra^s^ PPP“catlo,m was not 
dlllSuuf '’’“P *“ ““"s>^y brler^li’,?, '>auausc the eictemal 
partS^e , P P™‘cr inteerai if 

partment conducting the school ^ Teacher Training De- 
ts examinations or ?hf 

S '^“■Bh there are “■= “vil Service 

srfw^ff ' put within Ite chfT f”' =‘PPUPP‘lons that 

B cw into an aU-pcep.,^, nationauSf^f P^P^My not 

dlUlcully Bovemmert aeenev strue- 

Deoartmor.* . S departmental ““I'* comparable 
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the country’s trained measurement personnel. Nor does this imply 
that a series o£ separate homes to serve the needs of the various 
branches of government should be developed. Thailand cannot afford 
the costly duplication of effort that this would require. It seems ^ 
clear now as it did at the beginning of the project that the need 
a centralized testing service that can s®"® develoned 

project did not show how such a center could or should be P 


Evaluation 


The tentative conclusions that can be drawn 
experience are that the main strengths of this first model He 
following: 


1. The administrative simplicity of estabUshing and chartering 
a new testing center; and 


2. The ease of arranging and iroplementing the extensive 
velopmental research that test construction requires. 


Its main weaknesses lie in the following: 


1. The potential difficulty of recruiting able staff into the regu- 
lar civil service structure; and 


2. The potenual difficulty of extending services to other autono- 
mous organizations. 


Resolving both of these "ustC^^ 

Single-agency ™°f„VobIems but no bureaucratic prob- 

The fact that serious jieTn Thailand the situation was 

lems were encountered in » cimolv of the enormous difference 

just the reverse, may be a have the profes- 

in the sizes of »=== e countries that afford the bureau- 

lllac "Se'niaUty of^ i Uberia probably are the exception. 


Still, some steps can be •^^"^.““rotnf^^rUmHed r^fces 

adequate ‘“f “^m 

remains a uontinuing P bl ^ 


dry’s umueu 

“;\Ti;s\^;;ntrnuingproblem^^^^^ 

Seri:rer.‘o^”" 

in which it is now an tot^rtests are not feastbie, 
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(as was set up at the beginning o£ the project in Liberia ami a number 
ol other countries) may help to bridge the gap stilficiently to provide 
the center with a de facto interagency charter. 

But such inherent weaknesses, rooted in long-standing local 
characteristics, cannot be overcome completely within the life of the 
project, and will continue to Impose some limitations. And it Is there- 
fore important to assess their probable effects at the Ume of the 
feasibilUy study, as stressed in the preceding chapter. 


The Testing Division of 
a Quasi-Govemmental Organization 

becom?/irt°rf *“ 

than tcstlS aM lunctlons other 

In this cSl’ the tor Its support. But 

ol the central gowmnie^SJ^fh'°° S2!.o"0 of the operating branches 
with tte ln«ei°li^”„tr oT^r;Ja'f ““ 
that are being es^UsH h, -goremmenlal organlzaUons 

.».totreal‘Ssa^rrc?‘^f^i-ae.rtn^Serve!^ 

organlaaU^'!'*^^® *■“ "^Ut Utssn tjpes of 

■ncnt only. It a^ ^ ^rea, was an Interim arrunge- 

ahmlnlstraUrely to the Central •“ “ 

Is a private organliaUua 'n“ ’''=®sroh Institute (which 

s^talning gove^^®^, ®^®'r®s bfll ol Its annual budget as a 
f P nntll a permanent K relntlon- 

^r a pvrtoj ^ months^ relationship lasted 

“t® permanent InstlluUoS toe ‘''® ®stnblishment 

onse tnal had been envisioned. 

^'°lheM'uoValS^F^^„>«^entl^ 
ooSS“°° ®°™®rted ^ ® ‘®'^® ttnlependent 

2 d ®®”>>er ol educallo^a^L^ government grants. It 
had established an j , institutes in R naiHi 

Structare TheSfik*^'®’ “t' *nsUlate'?^I’'f ““t ns a research 
tirelv auhmn ‘^® '“*“tolon decided tn . ^ regular organlzatioual 
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years, and members of the AIR staff are still serving in Brazil as 
advisers. 


Ease of Establishment 


Although a period of approximately six months was required m 
each of these countries to formalize the arrangements, the ^reement 
in principle had been reached quickly, and the research ^ 
carried forward throughout the administrative nego a ons. 
the above model, only a few Individuais had to participate f" ^ 
cisions; and this greatly slmpUfles the problems of obtaimng authorl 
zation to proceed. 


It Should he noted, however, that in both of 
research was already part of the parent “ge 

erations, so that a basic poUcy decision was 

proposed arrangement required these organ insti- 

new field (as would have been the c^e, for ■f. ’ . „ th. 

tutions as the Korean Institute of Science “ „ diffi- 

Applied Scientific Research Corporation of Thaita^i^g^e 

cultles would probably have been j arrangement 

need not be a deterrent to the ®’5>Ioration of this ^e o nrr g 
in countries that have no institutions chartered for testing, 
might still prove to be the best of the available options. 


Recruitment of Staff 


Although positions with these types o, Ooa=>;B”-“ "/ar" 

izations are thought to be higWy most senior pro- 

put to the test in either country. A appointments with 

fessionals typically hold a number 

separate organizations, ^ , (^er appointments. There 

take on these functions H ! .‘g r, ° ^ganlzations could also compete 

is some reason to bebeve that ° yet been shown, 

effectively for full-time employees, but this has y 


One clear-cut advantage that “^^^^eTpou’ciei can be 

the regular civil service, the bourse of the 

much more readily changed i^tttat the staff's sala-'' 


much ^mor; readily changed -iarics 

Brazil project, for qualifications; the Vargas 

were not nearly lary schedule for all 

Foundation P--““^‘"VcS^UlcaUon. Such changes are at best dU- 
riU^mgSo^l irL evil service structure. 
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A second advantage is that a private organization can more 
readily obtain stall on a loan basis from other agencies than can an 
official government unit. In both Brazil and Korea a number of the 
initial staff needs were met by the "secondment" of personnel from 
out-side organizations, and this is an especially helpful device for 
stretching the modest budget with which new testing centers tyolcally 
begin. jf ^ 


Research Opportunities 

mnrs research naturally require 

more etforl in this model than In the above because a quasl-govern- 

agSs'concec a ' 7" tb the operating 

afencl^ toe »hdn these 
operations, it does tenrto UmU “”6blng test 

methodoloeical h ° research, such as the 

nuXfX^vomr/es * sreater 

required. ® design of the research also may be 


drawing on the prestige and ne’rsoto ‘^“'5,':"*“®® p® h® overcome by 
assigned to the center and In both”?? 'ha senior specialists 

portant avenue to research But ^Pt®a this was an Im- 

program developed within this f ^*te research 
more opportunistic than tomnrehS“*'™'‘" he 

though other lactors were also 'h «‘aPPPnted. AI- 

Korea proicct generated as tucT "■= the 

dal" Thailand and Uberia centers. “'p "““t‘ 

P^ractlc al Appllcatifana 

toe b7h ""'p ““‘P' ® a'^f PPaellcnl test appUcations, 
toe PPaltl^Itecr'm “■'^P-hPalion eaa 

n“ count ! PPPPPtPdlles tor researi i, ['!Ep“’p “Ppp*. as in the 
an olHcfal u Suaranlced minimum if in ‘™p ep”ter cnn- 
vdop Us eSr.™“'“‘ has Ir”m tiutn!?” fPPhPallons that 
’»‘ll eenerallv “pp**- the poslllvo 

least Ih'Jre Inf," ‘he sconHi ,h„ “’P‘ “’PPP 

Close doors tn ^ since the intera it can at 

-P ‘hat uutmatety rre^;?lore!‘“"""“ "" 
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The practical question, of course, is whether a large number of 
operating agencies will in fact turn over their internal testing oper- 
ations to an outside center over which they exercise no control. For 
some types of agencies, such as industrial orgamzations, s 
no special problems. For others, such as the public 
this may be an extremely difficult thing to do. So far, e 
of the BraziUan unit have not been extended to these more sensitive 
types of test applications. 

Because this center has only recently begun 
services, however, an assessment at this time 
The delicate mechanisms necessary may yet be develops . P 
tential of this second model therefore is still unknown. 

Evaluation 

Tentatively, then, this second model 
ally good vehicle for beginning a ^ gt ijg deferred, 

when a final decision on the institutional s clear advantages 

As a permanent home for testing it seems to have ^ 

over the official-government mc^el in ^ . carrying out a 

ment incentives, but corresponding disad ^ relative merits 

comprehensive program of f »®‘S«l"etmfsW^^^ have not 

In affording a broad scope of operational test g 

yet been determined. 


The Largely Autonomous National Center 

The third approach dilfers of a larger 

important respects. The first is tha ^ .. ,, .„m institution de- 
organization, but is Itself _ second ^s that the majoT^ers 

voted wholly or mainly to '^-ement of the organization, 

ot testing services partici pa te in the » ol funding may 

at least It Itself may be 

still be the government, ^ twTelements ol autonomy ^d 

regarded as Its distinctive characterist . 

consumer control give this third roouei 

The first center of /^Mting ^n»! w operated as 

projects was the ^^=J“'‘"4nths prior to Us merger 

an independent organization lo ^ f^nuncil The second was the 
with the West African Srioral Sciences, which was 
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period of its existence; the Korean Institute throughout the first year 
of its (iterations. 


Other examples of this model are the four national organizations 
that comprise the West African Examinations Council, though these 
were, of course, not developed within the scope of this research. The 
council was established in 1953, seven years before the first AID/AIR 
project began. 

Ease of Establishment 


tn h, ol a national testing center has so far proved 

The time lag between the 
acSsfahv" e ^ Nigeria and the 

tw^iis ! ''‘8«lan Aptitude Testl4 Unit was nearly 

required as noted **^*1^^^ arrangement ol fifteen months was 
be inherent in thiQ patterns of evolution may well 

pendent agencies and th J^*^*^*^ agreement of numerous inde- 
ceptable to them all ^ '*eU-defined charter ac- 

o^rse. ad^au iJ^erlm 

this model In most developto^^ljrte^* sssential prerequisites to 

moreover, was Ui^re'ady°av!dlabSr'T'T*''^'‘* “ h““btrtes. 
had stroi^ supporters. In Nitr^He. stternative models that also 
legal responslbiuty of'the West Alrir** testing was the 

small program ofTpUtSle testiijt r,? a 

^ programs were being nlan^ begun by one university, 

rf St “P'lhlmies of at lea^two expansion 

™ underway W r Ministries 

provided a reasonable base lor forth.* efforts could have 

Korea, ongoing lesii,^ Pf^'P^MonaUzation. And in 

lileS p“h cort^olied a ™°”K a variety 

t-tttu^orhuri^ - t“h\?e‘4: 

a truly Independent and 
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indigenous estabUshment do provide further illustration of the com- 
plexities of developing largely autonomous national centers. 


Recruitment of Staff 

With respect to professional recruitment, this ^ 

stantlally superior to the preceding two. These orpn z . 

lish their own salary scales and conditions of , ^^3 

ented explfcitly toward measurement functions, ey „ (i,g 
emphasis in the Incentives provided. In both Nigeria ^d Korea the_ 
conditions of service are considerably better than m ‘^e other^cmn^ 
tries, and there have been no problems in recruiting g y a 
staff. 

In the Nigerian Aptitude Testing Unit 
of this organization did jeopardize staff re en , ^ African 
the unit operated within the statutory char e -tatute itself: this 

Examinations Council rather loWcd wl» 

meant, as the staff well realized, that it 33 [ally jnte- 

wlthout cause, at the council’s pleasure. Until the uw,w^= ^ 

grated wlthtn the council, staff r|,u!rements in this ap- 

Jurldical basis may well be one of the main requi 
preach to institution-building. 

Research Opportunities 

u thic; model are theoretically 
The opportunities for aHicinate in the center's man- 

high because the many agencies P operational access required, 
agement can (and usually jighed may be constrained by 

But the research that is in fact accomp 
two other practical reasons. 

The first is that mus^dev^a substantial portion 

pect to be subsidized ", endeavors. In the case of 

ot their activities to ■’evenue-pr^oclng Selences. this 

the Korean Institute for ® fZ time available for research, 

may not impose a rL^rch funds as one of the 

because it is planned to attract ,3 3,eee service-oriented 

primary sources of "^‘[‘‘fulglri^Aptltude Testing " IVesUng 
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A second related reason is that the management of this type of 
institution is more demanding than the management of the two earlier 
organizational models, and that the most able researchers are typi- 
cally assigned to the senior administrative positions. And, as In all 
organizations, administrative crises regularly take precedence over 
substantive pursuits. 


Thus, it may be that certain special features have to be intro- 
duced in the implementation of this model to insure a continuing re- 
search emphasis, U this Is intended to be a major Inslltatlonal ob- 
jective. But the kinds ol leatures that will indeed be effective is not 
yet The Nigerian Aplltude Testing Unit tried to create a num- 
t staff positions that would be devoted exclusively to research, 
l^al ^“'“fdPf'-actlng basic research to the 

present was particularly successful. The ’ 

too nT?"? “"'rtng research services to outside 
agencies Is too new to be adecpiately assessed. 

Practical ApnUcatlnna 

range rfU°sHn^fedrin'?d!.“‘. “““"S ">= 

railage. Both the Nigerian ““fa'aPflfPg 

can feminalto^ Cou“u etSSMtl 

1968, virtually aU ol the Srowth; by 

able in Ntger/a, anS acmaTweSt,l‘'r‘"« 

potential consumers. being provided to all categories of 


Testing Unit was the degree to whteiftT'’'^^ Nigerian Aptitude 
vices participated In the devcIoDmenl^t'.?“*'°' 
ment of its operations, Fieure 26 ch ^ manage- 

member governing board composition of its 23- 

sentatlon that was achieved. A stmlhr vepre- 

dltlons. Is suggested as generallv dM f?,**®™’ adapted to local con- 
tlonal centers. ® -feslrable for all autonomous na- 

Examinations Council 

also devM A ^ addition to this seninV" was established 

prortncialTpl'*? ol subcoi^lH..^™?!,**®'' ‘''® “"“=*1 lias 

proving each of ih^ 'fa'egated to these lhe*li,!’°*ili**’! and 

has created a Partkl^i roStor^ta^ 

a; rule tor a large number ot key 
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figure 26 


Board of Governors of Nigerian Aptitude Testing Unit 


Organization 

Federal Ministry of Education 
Regional Ministries of Education 

Federal Ministry of Labor 
National Manpower Board 
West African Examinations Council 
Universities 

Employers' Consultative Assoc- 
iation 

Chamber of Commerce 
Any 

Nigerian Aptitude Testing Unit 


Renresentative(s) 


Permanent secretary 
Permanent secretary of 
each 

Permanent secretary 

Secretary 

Delegates 

One delegate each 

Delegates 

Delegate 

Chairman of board 
Director 


( 1 ) 

(4) 

(1) 

( 1 ) 

(4) 

(5) 

(4) 

( 1 ) 

( 1 ) 

( 1 ) 


individuals throughout the influence of the 

mental in developing the present ^ certification tests should 

Any center that will have responsib Uty for 
find this a highly useful model to follow. 


Evaluation 

The main advantages of a largely autonomous national 
appear to lie in the following: 

i. The degree to Which it can meet local testing needs Of all 

types in all sectors; and 

3. The attractive employment opportunities it can offer 
career professional staff. 

Its malor weaknesses are the fouowmg: 
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2. The lenghty eTOlutionary process that may be required to 
bring such a center into existence- 

Whenever local conditions permit, it should be regarded as the pre- 
ferred model for meeting the full range of testing needs in a developii^ 
ccwntry. 


The Largely Autonomous International Center 

The final model is that of an Independent regional center that 
scrips Is supported by a number ol countries. The prime ex- 

Esbrninatlons CouLil, to which 
^ Hgio-ql basis has been provided for the 
past four ye^s, as a continuation of the earUer Nigerian oroffram 

^ ^ Africa, which also are 

patterned to a considerable degree on the West Alrican orga^^on. 

E^e of EstablishTn«>n^ 

tutlonl^ e development of a multinational insti- 
center. All of the dlfficultlpc?^ autonomous national 

taaln, and the further difficuiH ^ national level re- 

sovereign states?omptf,?'J.““ “s InieresU of 

encountered. An even lonrer problems that wiU be 

almost certainly be req^rted."^ planning and negotiation wiU 

to pool their resources and^mn'a^ opportunity lor smaller countries 
t^t IndiriduaUy they would ij hant"^^ “ ^volessional capabUity 
^e decision that uirla m^e^Hf'L'f’ "> ’’"'‘“P- fas 
Sion; a similar decUlon rS^su'teef "'m » Pvecedlng dlscus- 

'°™'V‘'S. such as^lgeiS T if *“ btalawl. Even ior 

nlor^'\°’'''^® potenllally attracUve”?”^* Ghana, the regional 

P-od.hengeograpMc.eU'TX'STilrtlJ^s^t-^.- 

^“inatlons Council in 

.-s;=.>5s£“ S™ “=•= 

“oer countries, but has Instead 
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adapted its programs in each country in accordance with the decisions 
of the local committees; in this way it has provided a maximum of 
benefits with a minimum of control. And such flexibility may genera y 
be essential for the long-term viability of a regional organization. 


Recruitment of Staff 

Regional organizations are generally In a highly favorable 
position with respect to the recruitment of staif; they can 
a large pool of potential candidates and typically offer con 
service more generous than those in the member ^un nes. i 
has been the experience also of the West African 
cil, which has consistently been able to attract outstandi g . 
this respect, the regional model is probably superior 
others. 

Research Opportunities 

The basic coniuct between the 

mands of ongoing operations that organizations. For, 
of the national center appUes equally to ®joes give It greater 

even though the larger “‘^“portl^Ltely llcreased 

flexiblUty in the assignment of staff, me prop j oonstrain 

demands of its far-flung reLarch. 

the amount of time that can be devoted to innovative 

Thus, the establishment j oj a^operallonal re- 

Development and Research that wou . the activities of this 

sponslbiUtles was probably helpful. ^elopment" much more 

office have in fact had to e S “he council's ob- 

than the "research." The routine “• preparation and analysis 

jective achievement tests re will continue to cUmb 

of 2,500 new test items per a p^verted to objective 

as more of the receive continuing 

versions. If research is soecial provisions may have to be 

of course, is - as” mrnariofal center. 

made in the regional just as m 

rovide a wide variety of testing 

That a regional “rgutd^t'o” 1 „ ,he recent 

services toils “‘’"'J?'; ®°r&LlnaUon3 Council. The many services 
history of the West f ■■ f " ^ria are now being extended also to the 
Initially available only I ”>8 , appears to be entirely a 

other countries, and the regie 
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elective a vehicle for meeting the full range of testing needs as a 
separate national center. 

Certain aspects of the West Africa experience, moreover, sug- 
gest that a regional framework may even accelerate the growth of 
testing services in the countries it serves. The council's initial plan 
had been to defer the Introduction of aptitude measures until it had 
accomplished its highest priority task of taking over all of the certi- 
fication tests that were stiU being set and marked in the United King- 
dora-a damaging and long-term effort that has not been completed 
“Syria's rapid development Increased the de- 
NlcpHil'n ^ ^ demand led to the creation of the 

to revisMN counclfs largest member country, it had 

tte g™ti f “ "a “™- 0'e^^al‘oually, it could not i^ilore 

notably sccondi™ ^^'^^^hment that In certain fields, 

over services thTt the°“ testing, was being asked to take 

It could Mt fullv But legally 

under Its regional charter 'ompetlllve operations 

In Nigeria to a “o the 

aptitude testing was Introdiic *!j?'!*^k** indirect manner, 

the orlgtral piL Introduced In these countries well In advance of 

to other 0'ganlzatloM^oS°cr!i”i*i^” probably not generaUzable 
that a regional center must he r ^^tes. But the implicit proposition 
countries, and ttS a n™ slres,2’r'''" “■= “c-^Per 

of new services In also the olhe™ *" avallablUtj 

national service orgaiUzatlo^ H applicable to all multi- 

especially potent approach to rapid model could be an 


regional organization. 

“o5K£“ ~ SuSKK 
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The main limitations of this approach are the following: 


1. The establishment of a regional center is likely to entail a 
lengthy delay while the necessary arrangements and negotiations are 
being completed. 


2. In many locations the rcaUtles of geographic, 
political factors may make a regional testing service entire y 
realistic. 

On balance, the two-stage process of beginmng 

arrangement and then transitioning to an in epen for in- 

regioL center may be the generally most productive model for 

stitution-building. 


financial support 


Patterns of financial ° ri„„ted.’ But the three basic 

of the basic model of organization that P center earn- 

sources of funds for aii models are gove „ briefly from 

ings, and external assistance. Each “““hat were de- 

the point of view of the perhaps generaUzable Ideas 
veloped at the various project locations. 


Government Grants 

The mechanics of ^ovemment subsidy^^^tak^ -any 
At the simplest level, center staff to its rolls M 

ganizatlon, the latter can --P*T ^^, 3 ^ director to charge other 

regular employees and accordance with estabUshed requlsl- 

expenditures to its L , This was essentially the approa 

tioning and accounting procedures). This more 

used in financing the Liberia C“‘“-„„ance a portion of to 

complex pattern within the modej “ "JSer by drawing on the 

costs in this manner airf to P separate government account 

special funds that pceB^ams. This was done In 

for the support of subsidize a^Iargely autonomous ccnle 

both Brazil and Thailand. To sutel^ze a la ^ ^ 

the government Testing Unit; or It can contribute “ 

of the Nigerian Aptimde ™S_^^^j ,^3 can 

sum on the basis o ^be^g done in Korea. Or the government can 
mits for review, as i:» 
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A second major category of fees includes those assessed for^ 
the entrance examinations to government-operated schools and t™"' 
ing centers. These programs can generally be operated at a profit 
with a low per capita lee, which may be paid by the government (as 
part of its grant or as an additional assessment), by the candidate 
directly, or, as in the case of the Nigerian Aptitude Testing Umt, by 
the individual institutions, which themselves charged a lee ^ 
candidate for admission. In the Nigerian operation, 
of the tee assessed by the center tor such ^ 

grams took into account the share of the overhea cos 
effect been prepaid by the sustaining government contribution. 

The third major category conslsU of fees bJ'^BCd f“r testl^^ 

services to institutions that do not provide o er . g,.g jjgre 

the center, such as selection programs for P;;™‘“mp^ Z 
it is reasonable to assess fees that will , ‘he total tn^^ 

program, and also contribute to the surplus , .gj gj a modest 

Nigeria schedule of fees for such appUca lo "set-up" charge 
pe; capita lee (approximately VSM “ 
for testing small groups, and any relat 

income from services f “ nd rS 

as the center expands its capabilities center's program, 

search, especially if it ‘f ‘‘{'■‘“‘‘i'/L explored, as the Korean Insti- 
ls the major possibiUty ttot sho , ^ alces high-speed data 
tute is currently doing. Once the ac^, there 

processing equipment and c® , or services or consultation to 

should he many other opportumiies for 
improve the center's financial position. 


t^cfinrr rpiiter caH sometimes 
In addition to such cash '“^“j/sepport that will similarly 
obtain also direct personnel and 1 P arranging for pro- 

help to meet its operating agendea has already been 

fessional stall on "eecondmen ' ««« found to be available 

noted. An expanded “PPd'tumty t have to sub- 

in Brazil, where “"‘‘‘‘‘“‘^^eHence as a prerequisite to graduation, 
mlt evidence of practical „lthout pay to accumulate toe re- 

and therefore are eager "‘““"gency that could not tr^fer 

quired number of hours, testing agreed Instead to do toe 
Lds to the center ‘“Pey/feerrfS provided; in Nlgccla^^^'a^'f 
printing in exchange i® ,, 2 ^ barter arrangement. Oppo 
were constructed in most developing countries, 

these kinds are no doubt avaiia 
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A Ilnal and perhaps obvious point is that all funds not immediately 
required for operating expenses should be invested. Short-term de- 
posit plans are offered by many banks, and interest rates in the de- 
TCloplng countries tend to be highly attractive. 


EMernal Assistance 

The requirements lor external assistance also can be met in a 
number of ways from a variety of sources. In West Africa the Agency 
for International Development is currently providing resident testing 
specialist and certain commodity support; the Ford Foundation is 
funding the staff training programs; and the British Council is supply- 
ing a subject matter specialist to assist with the development of a 

^ projects of the testing center 

^ Foundation, as earlier noted, 

nartiai f.tnHu f^O'^ndation and the Fulbright Commission provided 
aid conir^^H f specUlc research undertaking to supplement the 
^erM?2Se°. ..''"“a "“PPOrt. To the extent that 

contrlbutlonsVre ler'hni^'i''?^' hleges, the most appropriate external 

proprlatelv take the t ' , **t®se Inputs can perhaps more ap- 

rangcments can be muhi^ii^ hy the center. Such ar- 

ganlzatlon's ovn research ^ external or- 

eitocttveness ct Peace Cta;hers.Tor%'':L°S;“ 


IOTEHNAL OncANIZATION 

a testing center Is ahcther to internal structure ol 

Uonal tines, h, u,e programmall?^^ programmatic or func- 
tl>e organlxallon are rwooSe r P«">ary subdlvtslons ol 

•trrices being otic red SI ‘S 

^>s td the programs It hasScn IJ'T' ^visions manages aU as- 
^nirttt“'’ '“Mlvtslora arc restS^ m / “p Innctlonal model, 
»«trt les ,1^ 'PP’Prlse the tcSli^ Ijpes of 

P' ‘ki' ProgratS^t Performs the! 

tntlt^ operaliota. that comprise the center's overall 

“ also can be consfttSrt leatures of both 
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For a small testing center, or a potentially large center just 
beginning its operations, the programmatic pattern of orgaiuzation 
appears to be the better. When there is a Umited staff, ^si^ng one 
professional full responsihlUty for a program-to supervise the de- 
velopment of the tests, the field tryouts, the printing, tte ■'eporting 
of the results, and aU other detalls-is more efficient than attempts 
to divide these responsibilities among a number of specialized umts. 
The controls and the coordination mechanisms necessary 
ment a functional model cannot yet be provided and “ 
izational structure is the indicated approach, ,, , 

organization of the Nigerian Aptitude Testmg m w Liberia 

established; a highly similar structure also was used by the Liberia 

Testing Center. 

Certain of the basic service functions, ^vi- 

centrallzed even within this programmatic app internal test 

sions of the Nigerian Aptitade T“‘f^t.,?msfscoring section that 
scoring capabilities, but shared a centra . division 

reported to the director. It was the respo completed 

he?d. however, to insure that the of tests was comp^ 

on schedule, even if he had to employ P ^ unitary re- 
plement the central service section. And this pattern 
sponsibility seemed to work well. 

The programmatic approart is Institute lor 

multipurpose research organlzaUon, such as 


Initial Organization 


figure 27 

of the Nigerian Aptitude Testing unit 
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Research in the Behavioral Sciences, that plans to engage in activities 
c^er than testing. Here, the Test Development and Research De- 
partment is one of four coordinate substantive divisions, defined by 
program orientation, and then is split Itself— again programmatically— 
into sections devoted to different types of test applications. 


For a large testing establishment, such as the West African 
Examinations Council, either approach can be used. Before its merger 
with the Nigerian Aptitude Testing Unit, the Council was organized 
along programmaUc lines, with a designated staff member in charge 
of each program in each member country. Then, as part of the merger 
arrj^ement, a functional approach was adopted. The administration 
fhA remained the responsiblUty of the existing structure, 

nt thp these programs became the responsibility 

Development and Research Office, and the statistical 
both t a loint responsibility, since these have 

chart In components. The partial organization 

Chart in Figure 28 shows the essential features of tMs arrang^ent. 


Ideal ^ model are that it liberates the tech- 

whlch should spur a mfrfactl?'*"* “dnagement responsibilities, 
and that It provides development and research, 

sod the talked adniM«r,t'^*’fif‘?”‘*‘“ ‘alco'ci* professional 


aod the talented , , for the , 

disadvantages are that the^d"^ Dml are equally attractive. Its main 
not receive tte tbe 


not receive the clOT7iiroIessl™^‘'““°!L°' *csUng programs does 

matte model, and that the tees '“•'crent In the program- 

may therefore not be enuaii eequirements of standardized testing 

ment and Research Test Develop- 

for the administration of ** retained operational responsibility 
« la leu, cannot yet ^ 

menlallon. verted to the lunctlonal pattern ol Imple- 


model is Ukely to°be tte'gSralJi"*^’ ''’®''cl°ce, the programmatic 
U My be less clllclent Aunll^^H eltectlve structure, though 

at least lidliall^to ^ lunctlonal approach should 
edher programs that rcCI^' “>ahaEcmeut ot oertltlcatlon tests 
re<pdre less direct protcsslonal superrislou. 


operating PROCEDURES 


Anoth - 

fveU'lv°*T'"" p'Sre management 

IT pect ol the center’? veznla?'*'** procedures tor 

Operations. These should span 
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FIGURE 28 

Partial Organization, Chart of the West African 
Examinations Council 



* «o/-hanicaI routines, and should 

the full range from toShuted to all stall mem^rs. 

be pubUshed in the “SraniLqianded as a 

Though the content jralt should be developed as part 

perlence, at least a 

the preoperatlonal prep specUlc Issues that might 

A discussion ol the the scope ol ■ 

be encompassed in wered can be f 

but the range ol ^ 1„ the manuals prepared at a 

Ing is a sample of toe xop 
number of project locations. 
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1. Personnel. Appointment procedures; position descriptions; 
salary structure; performance reviews; promotion and transfer; cate- 
gories of leave; employee benefits; disciplinary actions; separation 
or termination; travel; payroll procedures; personnel records; etc. 


2. Finances . Responsibility; accountability; records, files, and 
reports; schedule of testing fees; billing procedures; disbursement 
categories and procedures; amortization procedures; audits; etc. 

Supplies and equipment. Inventory; ordering procedures; 
storage, maintenance; library lending procedures; accountability; etc. 

4. Security. Building; files, safes, vault; printing procedures; 
storage procedures; destruction procedures; shipping procedures; 
Inspection of tests by outsiders; etc. 


5. FlUng . 
forms; leT^ of 


Correspondence; administrative records; tests; 
retention; disposal procedures; access; etc. 


requests T or Regular, extra, and make-up sessions; 

UsTof materials preparations; check- 
out and return series; material check- 

0 l Eamtaee IdMUtr use dqdlpment; verilicallon 

meets and procedure; noi-chart n, require- 

etc. process from request to scoring; 


of hand-s’corin^kevs^n^J^^^^^' use, and storage 

toys: scoring procedu?es ’^S''s'c'?ri machine-scoring 

lormulas; norms; coding the data- and quaUty control; scoring 
pie lorms; updating Hem banks ^*^**=**“1 routines; sam- 

chart nl process Irom receipt ol papers'tVstorSg™ete!"“°'^' 

‘“'J? rel2s7oIsior2*iroth'^"°‘“’'‘‘’^’ forms; stan- 

ratlons; regular center reports; reports; stall pubU- 


made. 


t- An rrtremely'Sul^' fhe “annals should also be 
an error Il,e ol breaVo™ regard Is to maintain 

" ^^PPfeach may be us produS™ pragmatic trlal-aod- 
a> '< aas In the lerelopmem '“'ntive routine: 

® testing procedures. 
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Validity of the I-D Secondary School 
Admission Tests 


Minimum Requirements 

for Admissinn 


Present standards only 

Present standard and 
1"D Stanine 6 

Present standards and 
I'D Stanine 7 

Present standard and 
I-D Stanine 8 

Present standard and 
I-D Stanine 9 


Percentage Selected Likely to Perform 
_AbQve Level of Present Average Student 


r,'5o%-. . ■ 'I 


14% 


t." 82F 


C2ii; 


IZJ 


ess: 


ID 


stages. v;hichever “s t,orl “ in either one 

"nnibers involved. " ‘"“■■e convenient in Ught of the 

“ ‘-tn con- 

Uberal arts), a 

is unknown, it aii ^^iniization each candidaT^^^ should be applied 
tlclentol ? 7 an? are ueed^l eventuaU, Si? 

ii>e high degree noef- 

suggesta that using both??** i^bveen the Granh ° **® expected 

•echnical grounds ' “''n instrunienb. ^iilmetlo Test: 

oI Its lower intert’e??''”*'™'*' Testta «?,*** ''“'^‘i'whlte. On 

Riore acceotahio » ®°*'*'®Iatlons* but tho superior because 
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conditions. Either wav, a total admmistration time of approximately 
one hour and 45 minutes will be required. 


If the curriculum has a science Was, the World Information T 
can reasonably be dropped, resulting in a 
series. Similarly, the Science Information Test is 
curricula that emphasize commerce, admimstrabon, or 
science courses For these reduced series the Arithmetic Test has 
been found significanUy better than the Graphs Test in the 
studies that have been completed, and may be the more approp 
to apply in all situations. 

Overall, the validities for all of these P°=‘==7^Ximar“wvel, 
are approximately the same as those obtained ‘'1= P?"f 
and the improvements charted in Figure are g obtained when 

projections. At both levels even better results shoWd W obmin^^^^^^^ 
differential test weights consistenUy effective for the intended app 
tion have been developed. 


SELECTION FOR POSTPRIMARY TECHNICAL COURSES 

Technical selection programs ^"'SllfusriSsWal 

whether in the context of vocational schoo related measurement 

training courses must take Typi- 

issues, but also of the social status of I offices or 

cally, wese are considered far desirable than jobs in 
in government service: and many individu because they 

training do so only because nothing else is available. 


TABLE 18 

The I-D Postsecondary Scholastic Aptitude Tes 



■ criterion Instruments, 

'Details on the numbers of , 

etc., may be found in u- '-n Technical Manual. 
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est as well as potential, a sizable proporUon may wander to other 
pursuits as soon as the opportunity arises. 

One appropriate preeaallon to allordeil tjy the preselection intor- 
elew, eihlch can and should he directed mainly toward the ® 

actual asplrauoito. InlerviewB lor technical training should generally 
be longer than those used lor academic admtesion programs, and it 
should be expected that a larger percenUge of applicants will be re- 
jected at this stage ol the selection procedures. 


A second, perhaps less apparent Implication is that school achieve- 
ment tests should not be used as an initial screening hurdle. Even 
though success in technical training programs will depend In part on 
the same kinds ol characteristics that hl^ academic performance re- 
quires, the applicants who were successful in school and yet could not 
continue their education are apt to be those least satisfied with blue- 
coUar work, and therefore particularly poor risks. To ensure that the 
trainees selected can cope with the theoretical as well as practical 
courses, the use of a scholastic aptitude test is a safer procedure. 


Thus, selection programs for postprimary technical training in- 
clude two malor components. One Is a set of aptitude tests designed 
to measure scholastic ability as well as technical skills more specifi- 
cally related to Job performance*, and Uie other is a staff interview ori- 
ented toward the applicant' s long-range aspirations. They may be ad- 
mlidstered Jointly, or separated into a two-stage sequential procedure. 


For selection into the skilled trades, the I-D series provides 
seven relevant testa, as listed in Table 19. Only five of these tests 
are used lor any one appUcaUon, however. Either SimilariUes or 
erbal Analogies is used as a measure of reasoning skill; and either 
Manual Dexterity or Finger Dexterity is used as the dexterity measure. 


to Ihe choice between SimlUclUes and Verbal Analogies, it la 

“PPNed. As noted to Chapter 6, 

the StmtlartUes Test cannot yet be regarded as luUy developed; and 

be less vaUd tor most 

Slv whL m."'* orlhodojt scholastic ahlEt, measure, 

teor^ r„ ““'PttonaUy low proporUon ol 

^SsTroufd'a't bt tbe trSnlng ol 
wcldersl should a less vertal test be considered. 


lion sSld'S^^rt'’' ‘b' “roroprlalc dexterity test lor a given appllca- 
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TABLE 19 

The I-D Skilled Trade Selection Tests** 



VAL 

MEC 

FIG 

BOX 

MAN 

FIN 

Reliability 

Validity* 

SIM 

? 

.47 

.23 

.24 

.13 

.16 

.56 

.30 

VAL 


.36 

.19 

.28 

.17 

.03 

.87 

.45 

MEC 



.36 

.28 

.20 

.23 

.79 

.45 

FIG 




.23 

.14 

.15 

.70 

.35 

BOX 





.13 

.15 

.72 

.33 

MAN 






.47 

.83 

.38 

fin 







.89 

.37 


♦Because the Similarities Test was used in most of the West 
African studies, the validity of the Verbal Analogies Test is a pre- 
liminary estimate, based on data from other locations. The validity 
of the Finger Dexterity Test is for electrician trainees, that of Manual 
Dexterity for other trade courses. 

♦♦Details on the numbers of examinees, criterion instruments, 
etc., may be found in the I-D Technical Manual . 


Dexterity Test has seemed to be the more relevant, and has in fact 
turned out to be the more accurate measure. The selection of trainees 
for electrician courses has been the one exception, and for these 
courses the Finger Dexterity Test should be applied. 

The most commonly used combination of tests, therefore, has 
been Verbal Analogies, Mechanical Information, Figures, Boxes, and 
Manual Dexterity, administered in that order. The total time required 
to give all five is approximately two and a half hours, and specially 
trained examiners are essential. In accordance with the data in Table 
19, a composite reliability of approximately .89 and a composite 
validity of approximately .63 can be projected. The corresponding 
gains in trainee proficiency are charted in Figure 24. 

If a two-stage selection process is used, the most appropriate 
tests for initial screening are the Verbal Analogies and Mechanical 
Information Tests, which do not require highly trained examiner 
personnel. In this case, the three remaining tests and the interview 
are applied at the second stage for final selection. Or all five aptitude 
tests may be used initially, so that the interview alone comprises the 
final hurdle. 
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FIGURE 24 


Validity ol the I-D Skilled Trade Selection Tests 


Minimum Requirements 
lor Admission 

Percentage Selected Likely to Perform 
Above Level of Present Average Student 

Present standards only 

(" so**, ' ■' ■ ; i 

□ 

Present standards and 

i 76% 

1 

1-D Stanine 6 


Present standards and 

I 'es^- . 


1-D Stanine 7 


Present standards and 

1 90% . • “ 


l-D Stanine 8 


Present standards and 



I'D Stanine 9 


. — '■ - 'll 


occupaUons other then the 

or “UchSil aecfeieel' T teehnlelan" 

or lecnnlcal assistant'' courses), the dexterity test is eenerallv 

Smfuf ^ ^ "SOW substitute is the 

opportunities' or mo'^e' advarced"S'lMl“' 

oeries, composite valldiUes near .60 caTb^'e'^'eaed 

among the vario™ c°a'tego?es ot IS'S b-' diHereoUal predlcUims 
iUscovered. In ma„y?pX'a,te™f„^“'== *55^^51 as yrt been 
selection but also placeLnfdSi 

a carpenter, who a machinist who ^ become 

tor weighUng the tests dilfere’nUy for^achT'^rf^**'''’ ^ Procedure 

separate Indexes of potenUal wonin iL ^ ^ obtain 

patterns ol relaUcmshlps between the consistent 

trades have not emerged from the data various 

results of attempts at dillerenlial ororfi generally poor 

lurmar resoarch'o'iZCt'aVir^^^^ 

srams already desSlSd SotSms"/ “"''‘"'e® '» “*0 POo- 
ic admission of curricula with a 
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science bias. None of the seven tests discussed in this secUon should 
be used with groups who have had more than nine years of education. 


SELECTION FOR CLERICAL COURSES 

For clerical selection programs, the scholastic achievement 
test returns to the ranks of useful screening procedures. Because 
the language and numerical skills emphasized in the regular school 
curriculum are directly pertinent to the typical clerical functions, 
and because the above problems of social stigma do not arise, appli- 
cants who were successful in school are generally good prospects for 
training as office workers. And although it would not be economical 
to construct special achievement tests for use as selection devices, 
indexes already available can and should be applied as the initial 
hurdle. 

Final selection is made, as before, on the basis of appropriate 
aptitude tests and an interview directed at such personal requirements 
(diction, neatness, appearance, etc.) as may be considered important 
to the positions that the applicants will be expected to fill. At the more 
advanced levels, proSiciency tests in typing, shorthand, and other skills 
presumably acquired In earlier courses also may have to be added to 
the selection procedure. But this poses no serious developmental 
problems, since proficiency tests do not require "cultural" modifica- 
tions, and standard techniques can be applied. 

If the numbers are large, it is appropriate to administer the 
aptitude tests and the interviews (plus proficiency measures) in 
separate stages. But splitting the aptibide tests themselves is not 
necessary in clerical selection, because these kinds of abilities can 
be measured reliably in less time than technical skills, and it is 
simpler to administer the entire series than to set up for two separate 
sessions. 

The I-D tests used for these applications include the Verbal 
Analogies, Coding, Names, Table Reading, and Arithmetic Tests, 
administered in that order. Either the Low form or High form of 
the Verbal Analogies Test may be used, in accordance with the appli- 
cants' educational level; and, since the instructions for these two 
tests are identical, groups of mixed educational backgrounds can be 
tested at the same time by giving each examinee the appropriate 
test paper once the explanations have been completed. Approximately 
one-and-a-half hours is required for the five-test series. 
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The sUUstical properties ol these tests are summarized in 
Table 20. From these data, a composite reliability above .92 and a 
composite validity o£ approximately .62 can be projected. The equiv- 
alent gain in traii^e perlormance Is charted in Figure 25. 


TABLE 20 


The 1-D Clerical Selection Series* • 


COD NAM TAB RTH Reliability Validity* 


VAL 

.26 

.35 

.35 

.34 

.87 

.55 

VAH 

.20 

.12 

.22 

.18 

.75 

.54 

COD 


.42 

.43 

.31 

.87 

.34 

NAM 



.45 

,39 

.73 

.39 

TAB 




.34 

.91 

.44 

RTH 





.83 

.40 


'Average coelliclents tor sampUs at varying educational levels. 

••Details on the numbers o! examinees, criterion instruments, 
etc., may be lound in the I-D Technical Manual. 


noURE 25 

Validity ol the 1-D Clerical Selection Tests 


>llatiaum Requirements 
lor Admissiaa 

Percentage Selected Likely to Perform 





r. . 50% I 

1 

Present standards and 

I-D Stanine 6 

Present standards and 

I-D Stanine 7 

Present standards and 



r B4^ — ■ — 

—J 1 


— --i — 1 

I-D Stanine 3 

Present standards and 
I-D Stanine 9 

t__96^ ^ — r- 

~ — -I 1 
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For vocational schools that offer both technical and business 
courses, some differential placement is possible on the basis of these 
tests and corresponding tests of the technical aptitude series. Largely 
as a result of the high internal consistency of the clerical series, the 
correlation between a composite of Coding, Names, and Table Reading 
and a composite of Boxes, Figures, and Manual Dexterity is below .45, 
which affords sufficient uniqueness for placement decisions. In these 
applications, the applicants can be screened initially on the Verbal 
Analogies and Arithmetic Tests, and then given the six-test technical/ 
clerical series as part of the final selection procedure. 


NOTE 

1. This coefficient of uniqueness is explained in J. C. Flanagan, 
Technical Report, Flanagan Aptitude Classification Tests (Chicago: 
Science Research Associates, 1959). 
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CHAPTER 



THE FEASIBILITY 
OF A 
CENTRALIZED 
JNSTITUTION 


In many developing countries, testing reform requires not only 
the test -related investments considered in Chapter 2 but also a 
substantial expansion of the professional and logistic infrastructure. 
Additional specialists must be trained, adequate facilities must be 
provided, and new bureaucracies must be created. The basic mecha- 
nisms for effective testing do not exist, and the priority need is for 
the requisite capabilities and institutions. 

In these situations the cost of testing reform is so high that it 
can normally be undertaken only at the central government level, 
with a sizable investment of public funds and substantial external 
assistance. Given these much higher stakes, the cost-effectiveness 
questions first raised in Chapter 2 must be explored even more tho- 
roughly and extended to a variety of Other nontechnical issues. 


THE MAXIMUM PAYOFF POTENTIAL 

The first step, as before, is to enumerate the benefits that are 
likely to be realized from the project If it is fully successful. These 
include the improvements that should result from the specific testing 
services that will be provided, and such additional benefits as can be 
expecte.d from the availability of expanded professional resources. 


Improvements in Testing 

Proposals for testing reform or inquests for technical assist- 
ance to testing are usually the result of certain specific pressures. 
Something ha.s gone wrong, and someone reasonably important wants 
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ll cotrertea. The liret AID/Am 

ol the problems ol the Uchnical ttilntag instilules ttal the VS. 
govenfmeet was supportiiiB; each subsequent ntot ^ h^r^ea 
Segered by one major need. "We must solve Problem A has been 
ibe normal beginning. 

n institution-building Is not required, such a proposal can reason- 
ably be evaluated on the basis of the payoll of having better tests to 
apply to Problem A, and of the cost of developing these better tests, 
as already outlined in Chapter 2. No other issues need be considered. 
But U insUtulion-buUding also must be undertaken, the analysis cannot 
be this simple; the institution-building component will add an •'overheaa 
figure of perhaps 2,000 or 12,000 percent to the basic cost of the 
testing program, and the payoff of Problem A. will seldom justify so 
large an investment. A feasibility study based on a single problem 
will almost always result In negative recommendations. 


But countries that have Problem A typically have also a Problem 
C, a Problem P, etc., and the sum of the payoffs for a combination 
o! these could make institution-building a prudent Investment. And it 
Is therefore suggested that the appropriate response to an Invitation 
to help solve Problem A is to offer instead to examine the full range 
of problems related to testing. A feasibility study of lesser scope 
will usually turn out to have been itself cost-ineffective. 


Thus, the assessment should properly begin with a comprehensive 
Inventory of the testing problems that a professional center might 
help to resolve In this country. And the procedures of Chapter 2 
should then be applied to each of these problems, using the appropriate 
one of the three methods ol payoff projection suggested. As before, 
improvements in both accuracy and logistics should be considered, 
and a reasonable degree of quantification should be attempted. 


Since the objective at this stage is to obtain a maximum possible 
payoff projection, virtually all types of testing problems should be 
Included in the assessment. But care should also be taken to ensure 
ilut each of the needs that can be evaluated at only the lowest level 
the magnitude of the problem 
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case in point. In nearly every survey that was conducted, the intro- 
duction of effective guidance procedures was strongly supported as 
one of the country’s most critical testing needs. And yet, there were 
only isolated cases in which guidance tests, once developed, actually 
could be applied. The quantities and varieties of educational oppor- 
tunity that are the ”so what“ of guidance did not exist, and the system 
as a whole did not provide the necessary flexibility for branching to 
different pursuits. Though guidance could quite properly be considered 
an essential component of the educational process from a philosophic 
point of view, it could not be implemented in these countries without 
a host of accompanying systems changes that were not lihely to be 
made in the foreseeable future. Including improved guidance procedures 
among the outcomes expected would have distorted the payoff pro- 
jections. 

Similarly, there may be considerable interest in the identification 
of exceptionally able youngsters in '’disadvantaged" rural areas, so 
that their talents will not be wasted, as seems now to be the case in 
many locations. In an earlier discussion, it was noted that tests for 
this purpose can in fact be constructed, and can quite accurately 
identify talents that are passed over by the traditional examination 
procedures. But for actual payoff, identifying them Is not enough. 

To transition to a quality school, many of these talented youngsters 
may need a period of intensive remedial learning—will opportunities 
for this be provided? To be able to stay in school, they will probably 
need special financial assistance— is government prepared to grant 
such support? Unless the results of the new tests can actually be 
translated into the appropriate action, little payoff can be expected. 

And the survey must for this reason look beyond the development of 
tests not now available in the country, and consider also the arrange- 
ments that have been made for practical implementation. 

A second class of inappropriate applications consists of the 
unfortunately large number of legitimate test needs which are not 
yet within the state of the art. That these include some of the more 
exciting payoffs is regrettable, but should not sway the appraisal, 
for although it may be within the realm of possibility for the proposed 
project to achieve the methodological breakthroughs required, this Is 
scarcely the most likely result. The selection of agricultural extension 
trainees who will be content to work in the rural areas rather than to 
flock to the big cities, for example, requires the measurement of 
characteristics that have proved elusive even in countries with long 
testing traditions, and the survey should not presume that the task 
will be easier in a developing country. For all payoffs that depend 
on the measurement of such slippery factors as "personality" or 
"motivation," past findings suggest a pessimistic appraisal. 
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Deleting these kinds of unrealistic applications from the inven- 
tory initially compiled will reduce the list, but even this reduced 
inventory will generally result in an impressive profile of payoff 
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are naturally oriented toward these examinations as the de facto goal 
of all education. 

In theory, this is fully as workable a system as the quality- 
point grading methods that American educators tend to prefer. If the 
examination accurately reflects the skills and values that the country's 
youth should be developing as a result of their schooling, it is in fact 
the superior procedure, since the school-to-school variations in 
grading standards inherent in the American system are automatically 
avoided. But in practice, the actual examinations are typicaUy found 
to be very much out of date. Their content may emphasize toe skills 
that are appropriate for the tiny minority of the population that will 
eventually reach the highest educational levels-a carryover from 
colonialist or elitist days. Their format may encour^e the 
learning of assorted factual infonnation-a remnant of former - 

glo approaches. And, where this is the case, the examinations can be 
legitimately regarded as a serious 
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easily change it. But a professional testing center may afford a partial 
solution. A senior official in one country claimed that since the 
establishment of the national testing center, he has been able to satisfy 
his many petitioners by telling them that the law requires testing, that 
there is a long waiting list for testing, but that he would personably 
arrange an early appointment for the candidate to give him the best 
possible chance. And the records of the testing center confirmed that 
he had indeed written many letters of ’’personal recommencmtion for 
testing, effecting an immediate social change without violating tradition. 

Again, these and similar outcomes cannot be taken for granted. 
But the provision of an alternative in a situation that now offers no 
other options clearly should be included in the payoff projection. 


Other Research Contributions 

A third category of payoffs derives from 
needed to staff a professional testing center are the 
needed lor a wide variety of other behavioral 
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